Good day. Six months ago I started learning machine learning, went through a couple of courses and got some experience with that. Then, seeing a variety of news about which neural networks are cool and can do a lot, I decided to try to study them. I started reading Nikolenko’s book about deep learning and in the course of reading I had a few ideas (which are not new to the world, but were of great interest to me), one of which is to create a neural network that would generate art for me that would not look cool only to me, the “father of the drawing child,” but also to other people. In this article I will try to describe the path that I followed in order to get the first results that satisfy me.
When I read the chapter on the adversary networks, I realized that now I can write something.
One of the first tasks was to write a web page parser for collecting datasets. For this, the wikiart site came up perfectly , there are a large number of pictures on it and everything is collected according to styles. This was my first parser, so I wrote it for 4-5 days, the first 3 of which took the poke along the completely wrong path. The correct way was to go to the network tab in the source code of the page and to track how images appear when you click the "more" button. Actually, for the same beginners like me, it will be good to show the code.
from scipy.misc import imresize, imsave from matplotlib.image import imread import requests import json from bs4 import BeautifulSoup from itertools import count import os import glob
In the first cell of the jupiter, I imported the necessary libraries.
def get_page(style, pagenum): page = requests.get(url1 + style + url2 + str(pagenum) + url3) return page def make_soup(page): soup = BeautifulSoup(page.text, 'html5lib') return soup def make_dir(name, s): path = os.getcwd() + '/' + s + '/' + name os.mkdir(path)
I describe the functions for convenient operation.
The first - gets a page in the form of text, the second makes this text more convenient for work. Well, the third to create the necessary folders for styles.
styles = ['kubizm'] url1 = 'https://www.wikiart.org/ru/paintings-by-style/' url2 = '?select=featured&json=2&layout=new&page=' url3 = '&resultType=masonry'
In the styles array, there should have been several styles, but it turned out that I loaded them completely unevenly.
for style in styles: make_dir(style, 'images') for style in styles: make_dir(style, 'new256_images')
Create the necessary folders. The second cycle creates folders in which the image will be saved, flattened in a square 256x256.
(At first I thought about somehow not rationing the size of the pictures so that there were no distortions, but I realized that this is either impossible or too difficult for me)
for style in styles: path = os.getcwd() + '\\images\\' + style + '\\' images = [] names = [] titles = [] for pagenum in count(start=1): page = get_page(style, pagenum) if page.text[0]!='{': break jsons = json.loads(page.text) paintings = jsons['Paintings'] if paintings is None: break for item in paintings: images_temp = [] images_dict = item['images'] if images_dict is None: images_temp.append(item['image']) names.append(item['artistName']) titles.append(item['title']) else: for inner_item in item['images']: images_temp.append(inner_item['image']) names.append(item['artistName']) titles.append(item['title']) images.append(images_temp) for char in ['/','\\','"', '?', ':','*','|','<','>']: titles = [title.replace(char, ' ') for title in titles] for listimg, name, title in zip(images, names, titles): if len(name) > 30: name = name[:25] if len(title) > 50: title = title[:50] if len(listimg) == 1: response = requests.get(listimg[0]) if response.status_code == 200: with open(path + name + ' ' + title + '.png', 'wb') as f: f.write(response.content) else: print('Error from server') else: for i, img in enumerate(listimg): response = requests.get(img) if response.status_code == 200: with open(path + name + ' ' + title + str(i) + '.png', 'wb') as f: f.write(response.content) else: print('Error from server')
Here you can download pictures and save them to the desired folder. Here pictures do not change the size, originals remain.
Interesting things happen in the first nested loop:
I decided to stupidly constantly ask for json's (json is a dictionary that the server returns when you click the More button. In the dictionary all the information about the pictures), and stop when the server returns something unintelligible and not like typical values . In this case, the first character of the returned text should have been an opening brace, followed by the body of the dictionary.
It was also noticed that the server can return something like an album of pictures. That is, in fact, an array of paintings. At first I thought that single paintings were returning, the name of the artists to them, and it may be that at the same time an array of paintings is given with the same artist name.
for style in styles: directory = os.getcwd() + '\\images\\' + style + '\\' new_dir = os.getcwd() + '\\new256_images\\' + style + '\\' filepaths = [] for dir_, _, files in os.walk(directory): for fileName in files: #relDir = os.path.relpath(dir_, directory) #relFile = os.path.join(relDir, fileName) relFile = fileName #print(directory) #print(relFile) filepaths.append(relFile) #print(filepaths[-1]) print(filepaths[0]) for i, fp in enumerate(filepaths): img = imread(directory + fp, 0) #/ 255.0 img = imresize(img, (256, 256)) imsave(new_dir + str(i) + ".png", img)
Here the images are resized and stored in a folder prepared for them.
Well, dataset assembled, you can proceed to the most interesting!
Next, after reading the original article, I started to create! But what was my disappointment when nothing good happened. During these attempts, I taught the network in the same style of pictures, but even so nothing worked out, so I decided to start learning how to generate numbers from a ministry. I will not stop here in detail, I will tell only about the architecture and the turning point, thanks to which the numbers began to be generated.
def build_generator(): model = Sequential() model.add(Dense(128 * 7 * 7, input_dim = latent_dim)) model.add(BatchNormalization()) model.add(LeakyReLU()) model.add(Reshape((7, 7, 128))) model.add(Conv2DTranspose(64, filter_size, strides=(2,2), padding='same')) model.add(BatchNormalization(momentum=0.8)) model.add(LeakyReLU()) model.add(Conv2DTranspose(32, filter_size, strides=(1, 1), padding='same')) model.add(BatchNormalization(momentum=0.8)) model.add(LeakyReLU()) model.add(Conv2DTranspose(img_channels, filter_size, strides=(2,2), padding='same')) model.add(Activation("tanh")) model.summary() return model
latent_dim is an array of 100 randomly generated numbers.
def build_discriminator(): model = Sequential() model.add(Conv2D(64, kernel_size=filter_size, strides = (2,2), input_shape=img_shape, padding="same")) model.add(LeakyReLU(alpha=0.2)) model.add(Dropout(0.25)) model.add(Conv2D(128, kernel_size=filter_size, strides = (2,2), padding="same")) model.add(BatchNormalization(momentum=0.8)) model.add(LeakyReLU(alpha=0.2)) model.add(Dropout(0.25)) model.add(Conv2D(128, kernel_size=filter_size, strides = (2,2), padding="same")) model.add(BatchNormalization(momentum=0.8)) model.add(LeakyReLU(alpha=0.2)) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(1)) model.add(Activation('sigmoid')) model.summary() return model
That is, the total size of the outputs of the convolutional layers and the number of layers are generally less than in the original article. 28x28 because I generate, not interiors!
Well, the same trick, thanks to which everything turned out - on the even iteration of the training, the discriminator looked at the generated pictures, and on the odd one - on the real ones.
On this, in general, everything. This DCGAN learned very quickly, for example, the picture at the beginning of this sub-topic was obtained for the 19th epoch,
These are already confident, but nevertheless from time to time unreal figures turned out on the 99th epoch of education.
Satisfied with the preliminary result, I stopped learning and began to think about how to solve the main problem.
The next step was to read about GAN with labels: the discriminator and generator are served with the class of the current picture. And after Ghana with labels, I learned about CAN - the decoding in general in the name of the subtopic.
In CAN, the discriminator tries to guess the picture class if the picture is from a real set. And, accordingly, in the case of training in the real picture, an error from guessing the class is presented as an error to the discriminator except the default one.
When training on the generated picture, the discriminator needs only to predict whether this picture is real or not.
The generator, moreover, in order to simply deceive the discriminator, it is necessary to ensure that the discriminator is at a loss when guessing the picture class, that is, the generator will be interested in that the outputs to the discriminator classes are as far as possible from 1, full confidence.
Turning to CAN, I again experienced difficulties, dysmoral because nothing works and is not trained. After several unpleasant failures, I decided to start everything from the beginning and save all changes (Yes, I didn’t do it before), weight and architecture (to interrupt training).
At first, I wanted to make a network that would generate one single image of 256x256 (all the following pictures of this size) for me without any labels. The tipping point here was that, on the contrary, in each iteration of learning, the discriminator should be given a look at the generated pictures and the real ones.
This is the result I stopped at and moved on to the next stage. Yes, colors are different from the real picture, but I was more interested in the ability of the network to select contours and objects. She handled this.
Then you could proceed to the main task - the generation of art. Immediately present the code, simultaneously commenting on it.
First, as always, you need to import all the libraries.
import glob from PIL import Image from keras.preprocessing.image import array_to_img, img_to_array, load_img from datetime import date from datetime import datetime import tensorflow as tf import numpy as np import argparse import math import os from matplotlib.image import imread from scipy.misc.pilutil import imresize, imsave import matplotlib.pyplot as plt import cv2 import keras from keras.models import Sequential, Model from keras.layers import Dense, Activation, Reshape, Flatten, Dropout, Input from keras.layers.convolutional import Conv2D, Conv2DTranspose, MaxPooling2D from keras.layers.normalization import BatchNormalization from keras.layers.advanced_activations import LeakyReLU from keras.optimizers import Adam, SGD from keras.datasets import mnist from keras import initializers import numpy as np import random
Generator creation.
The layer outputs are again different from the article. Somewhere in order to save memory (Conditions: home computer with gtx970), and somewhere because of the success with the configuration
def build_generator(): model = Sequential() model.add(Dense(128 * 16 * 8, input_dim = latent_dim)) model.add(BatchNormalization()) model.add(LeakyReLU()) model.add(Reshape((8, 8, 256))) model.add(Conv2DTranspose(512, filter_size_g, strides=(1,1), padding='same')) model.add(BatchNormalization(momentum=0.8)) model.add(LeakyReLU()) model.add(Conv2DTranspose(512, filter_size_g, strides=(1,1), padding='same')) model.add(BatchNormalization(momentum=0.8)) model.add(LeakyReLU()) model.add(Conv2DTranspose(256, filter_size_g, strides=(1,1), padding='same')) model.add(BatchNormalization(momentum=0.8)) model.add(LeakyReLU()) model.add(Conv2DTranspose(128, filter_size_g, strides=(2,2), padding='same')) model.add(BatchNormalization(momentum=0.8)) model.add(LeakyReLU()) model.add(Conv2DTranspose(64, filter_size_g, strides=(2,2), padding='same')) model.add(BatchNormalization(momentum=0.8)) model.add(LeakyReLU()) model.add(Conv2DTranspose(32, filter_size_g, strides=(2,2), padding='same')) model.add(BatchNormalization(momentum=0.8)) model.add(LeakyReLU()) model.add(Conv2DTranspose(16, filter_size_g, strides=(2,2), padding='same')) model.add(BatchNormalization(momentum=0.8)) model.add(LeakyReLU()) model.add(Conv2DTranspose(8, filter_size_g, strides=(2,2), padding='same')) model.add(BatchNormalization(momentum=0.8)) model.add(LeakyReLU()) model.add(Conv2DTranspose(img_channels, filter_size_g, strides=(1,1), padding='same')) model.add(Activation("tanh")) model.summary() return model
The discriminator creation function returns two models, one of which is trying to find out if the picture is real and the other is trying to find out the class of the picture.
def build_discriminator(num_classes): model = Sequential() model.add(Conv2D(64, kernel_size=filter_size_d, strides = (2,2), input_shape=img_shape, padding="same")) model.add(LeakyReLU(alpha=0.2)) model.add(Dropout(0.25)) model.add(Conv2D(128, kernel_size=filter_size_d, strides = (2,2), padding="same")) model.add(BatchNormalization(momentum=0.8)) model.add(LeakyReLU(alpha=0.2)) model.add(Dropout(0.25)) model.add(Conv2D(256, kernel_size=filter_size_d, strides = (2,2), padding="same")) model.add(BatchNormalization(momentum=0.8)) model.add(LeakyReLU(alpha=0.2)) model.add(Dropout(0.25)) model.add(Conv2D(512, kernel_size=filter_size_d, strides = (2,2), padding="same")) model.add(BatchNormalization(momentum=0.8)) model.add(LeakyReLU(alpha=0.2)) model.add(Dropout(0.25)) model.add(Conv2D(512, kernel_size=filter_size_d, strides = (2,2), padding="same")) model.add(BatchNormalization(momentum=0.8)) model.add(LeakyReLU(alpha=0.2)) model.add(Dropout(0.25)) model.add(Conv2D(512, kernel_size=filter_size_d, strides = (2,2), padding="same")) model.add(BatchNormalization(momentum=0.8)) model.add(LeakyReLU(alpha=0.2)) model.add(Dropout(0.25)) model.add(Flatten()) model.summary() img = Input(shape=img_shape) features = model(img) validity = Dense(1)(features) valid = Activation('sigmoid')(validity) label1 = Dense(1024)(features) lrelu1 = LeakyReLU(alpha=0.2)(label1) label2 = Dense(512)(label1) lrelu2 = LeakyReLU(alpha=0.2)(label2) label3 = Dense(num_classes)(label2) label = Activation('softmax')(label3) return Model(img, valid), Model(img, label)
Function to create a competitive model. In the competitive model, the discriminator is not trained.
def generator_containing_discriminator(g, d, d_label): noise = Input(shape=(latent_dim,)) img = g(noise) d.trainable = False d_label.trainable = False valid, target_label = d(img), d_label(img) return Model(noise, [valid, target_label])
Function to download batcha with real pictures and labels. data is an array of addresses that will be defined later. In the same function, the image is normalized.
def get_images_classes(batch_size, data): X_train = np.zeros((batch_size, img_rows, img_cols, img_channels)) y_labels = np.zeros(batch_size) choice_arr = np.random.randint(0, len(data), batch_size) for i in range(batch_size): rand_number = np.random.randint(0, len(data[choice_arr[i]])) temp_img = cv2.imread(data[choice_arr[i]][rand_number]) X_train[i] = temp_img y_labels[i] = choice_arr[i] X_train = (X_train - 127.5)/127.5 return X_train, y_labels
The function for the beautiful output of batch images. Actually, all the pictures from this article were collected by this function.
def combine_images(generated_images): num = generated_images.shape[0] width = int(math.sqrt(num)) height = int(math.ceil(float(num)/width)) shape = generated_images.shape[1:3] image = np.zeros((height*shape[0], width*shape[1], img_channels), dtype=generated_images.dtype) for index, img in enumerate(generated_images): i = int(index/width) j = index % width image[i*shape[0]:(i+1)*shape[0], j*shape[1]:(j+1)*shape[1]] = \ img[:, :, :,] return image
And here is the data. It returns in a more or less convenient way a set of addresses of pictures, which we, above, put into folders
def get_data(): styles_folder = os.listdir(path=os.getcwd() + "\\new256_images\\") num_styles = len(styles_folder) data = [] for i in range(num_styles): data.append(glob.glob(os.getcwd() + '\\new256_images\\' + styles_folder[i] + '\\*')) return data, num_styles
For the passage of the era exhibited a randomly large number, because it was too lazy to count the number of all the pictures. The same function provides for loading weights, if you need to continue training. Every 5 epochs of weight and architecture are preserved.
It is also worth writing that I tried to add noise to the input pictures, but in the last training I decided not to.
Smooth class labels have been used, they are very helpful for learning.
def train_another(epochs = 100, BATCH_SIZE = 4, weights = False, month_day = '', epoch = ''): data, num_styles = get_data() generator = build_generator() discriminator, d_label = build_discriminator(num_styles) discriminator.compile(loss=losses[0], optimizer=d_optim) d_label.compile(loss=losses[1], optimizer=d_optim) generator.compile(loss='binary_crossentropy', optimizer=g_optim) if month_day != '': generator.load_weights(os.getcwd() + '/' + month_day + epoch + ' gen_weights.h5') discriminator.load_weights(os.getcwd() + '/' + month_day + epoch + ' dis_weights.h5') d_label.load_weights(os.getcwd() + '/' + month_day + epoch + ' dis_label_weights.h5') dcgan = generator_containing_discriminator(generator, discriminator, d_label) dcgan.compile(loss=losses[0], optimizer=g_optim) discriminator.trainable = True d_label.trainable = True for epoch in range(epochs): for index in range(int(15000/BATCH_SIZE)): noise = np.random.normal(0, 1, (BATCH_SIZE, latent_dim)) real_images, real_labels = get_images_classes(BATCH_SIZE, data) #real_images += np.random.normal(size = img_shape, scale= 0.1) generated_images = generator.predict(noise) X = real_images real_labels = real_labels - 0.1 + np.random.rand(BATCH_SIZE)*0.2 y_classif = keras.utils.to_categorical(np.zeros(BATCH_SIZE) + real_labels, num_styles) y = 0.8 + np.random.rand(BATCH_SIZE)*0.2 d_loss = [] d_loss.append(discriminator.train_on_batch(X, y)) discriminator.trainable = False d_loss.append(d_label.train_on_batch(X, y_classif)) print("epoch %d batch %d d_loss : %f, label_loss: %f" % (epoch, index, d_loss[0], d_loss[1])) X = generated_images y = np.random.rand(BATCH_SIZE) * 0.2 d_loss = discriminator.train_on_batch(X, y) print("epoch %d batch %d d_loss : %f" % (epoch, index, d_loss)) noise = np.random.normal(0, 1, (BATCH_SIZE, latent_dim)) discriminator.trainable = False d_label.trainable = False y_classif = keras.utils.to_categorical(np.zeros(BATCH_SIZE) + 1/num_styles, num_styles) y = np.random.rand(BATCH_SIZE) * 0.3 g_loss = dcgan.train_on_batch(noise, [y, y_classif]) d_label.trainable = True discriminator.trainable = True print("epoch %d batch %d g_loss : %f, label_loss: %f" % (epoch, index, g_loss[0], g_loss[1])) if index % 50 == 0: image = combine_images(generated_images) image = image*127.5+127.5 cv2.imwrite( os.getcwd() + '\\generated\\epoch%d_%d.png' % (epoch, index), image) image = combine_images(real_images) image = image*127.5+127.5 cv2.imwrite( os.getcwd() + '\\generated\\epoch%d_%d_data.png' % (epoch, index), image) if epoch % 5 == 0: date_today = date.today() month, day = date_today.month, date_today.day # json d_json = discriminator.to_json() # json_file = open(os.getcwd() + "/%d.%d dis_model.json" % (day, month), "w") json_file.write(d_json) json_file.close() # json d_l_json = d_label.to_json() # json_file = open(os.getcwd() + "/%d.%d dis_label_model.json" % (day, month), "w") json_file.write(d_l_json) json_file.close() # json gen_json = generator.to_json() # json_file = open(os.getcwd() + "/%d.%d gen_model.json" % (day, month), "w") json_file.write(gen_json) json_file.close() discriminator.save_weights(os.getcwd() + '/%d.%d %d_epoch dis_weights.h5' % (day, month, epoch)) d_label.save_weights(os.getcwd() + '/%d.%d %d_epoch dis_label_weights.h5' % (day, month, epoch)) generator.save_weights(os.getcwd() + '/%d.%d %d_epoch gen_weights.h5' % (day, month, epoch))
Initializing variables and starting training. Due to the low "power" of my computer, learning is possible at a maximum on a batch size of 16 pictures.
img_rows = 256 img_cols = 256 img_channels = 3 img_shape = (img_rows, img_cols, img_channels) latent_dim = 100 filter_size_g = (5,5) filter_size_d = (5,5) d_strides = (2,2) color_mode = 'rgb' losses = ['binary_crossentropy', 'categorical_crossentropy'] g_optim = Adam(0.0002, beta_2 = 0.5) d_optim = Adam(0.0002, beta_2 = 0.5) train_another(1000, 16)
Actually, I want to write a post for a long time about this idea of ​​my own, this is not the best time for this, because this neuronka has been studying for three days and is now in the 113th era, but today I found interesting pictures, so I decided it was time already write a post!
These are the pictures turned out today. Perhaps by calling them, I can convey to the reader my personal perception of these pictures. It is quite noticeable that the network is not trained enough (or maybe it will not learn by such methods at all), especially considering that the pictures were taken by cheating, but today I received a result that I liked.
Further plans to train this configuration until the moment it becomes clear what it is capable of. Also plans to create a network that would increase these pictures to sane size. This is already invented and there is implementation.
I would be extremely pleased with constructive criticism, good advice and questions.
Source: https://habr.com/ru/post/431614/