Creating a mosaic image

Surely you have repeatedly seen on the Internet these pictures:

I decided to write a universal script for creating such images.

Theoretical part

Let's talk a little about how we are going to do all this. Suppose that there is some limited set of images with which we can weave the canvas, as well as one image that must be presented in the form of a mosaic. Then we need to split the image that needs to be converted into identical areas, each of which is then replaced with an image from a dataset with pictures.

This raises the question of how to understand which image from dataset we should replace a certain area. Of course, the ideal tiling of some area will be the same area. Each area size

$m \ times n$ can set

$3 \ times n \ times m$ numbers (here, each pixel corresponds to three numbers - its R, G and B components). In other words, each region is defined by a three-dimensional tensor. Now it becomes clear that in order to determine the quality of the tiling of a region with a picture, provided that their sizes coincide, we need to calculate some loss function. In this problem, we can consider the MSE of two tensors:
')

$MSE (x, y) = \ frac {\ sum \ limits_ {i = 1} ^ N (x_ {i} - y_ {i}) ^ 2} {N}$

Here

$N$ - the number of signs, in our case

$3 \ times n \ times m$ .

However, this formula is hardly applicable to real cases. The fact is that when dataset is quite large, and the areas into which the original image is divided are rather small, you will have to do inadmissibly many actions, namely, compress each image from the dataset to the size of the area and read MSE

$3 \ times n \ times m$ characteristics. More precisely, in this formula it is bad that we are forced to compress absolutely every image for comparison, and not just once, but a number equal to the number of areas into which the original picture is divided.

I propose the following solution to the problem: we will sacrifice a bit of quality and now we will characterize each picture from dataset only with 3 numbers: average RGB in the image. Of course, several problems follow from this: first, now the ideal paving of the area is not only she, but, for example, it is also inverted (obviously, this paving is worse than the first), secondly, after calculating the average color we can get such R, G and B, that the image will not even have a pixel with such components (in other words, it is difficult to say that our eye perceives the image as a mixture of all its colors). However, I did not think of a better way.

It turns out that now it remains for us only once to calculate the average RGB for images from dataset, and then use the information obtained.

Summarizing the above, we find that we now need some area to pick the closest RGB pixel from the set, and then tile the area with the image from the dataset that owns the found average RGB. To compare the area and the pixel we do the same: we convert the area into three numbers and find the closest average RGB. It turns out that we are only known

$R, G, B$ find in the set such

$R_ {i}, G_ {i}, B_ {i}$ , that the Euclidean distance between these two points in three-dimensional space will be minimal:

$\ sqrt {(R - R_ {i}) ^ 2 + (G - G_ {i}) ^ 2 + (B - B_ {i}) ^ 2} = min$

Dataset preprocessing

You can collect your own dataset pictures. I used fusion dataset with images of cats and dogs .

As I wrote above, we can once calculate the average RGB values for images from dataset and just save them. What we are doing:

import os import cv2 import numpy as np import pickle items = {} # cv2      BGR,   RGB,     for path in os.listdir('dogs_images_dataset'): #      ,    for file in os.listdir(os.path.join('dogs_images_dataset', path)): file1 = os.path.join('dogs_images_dataset', path + '/' + file) img = np.array(cv2.cvtColor(cv2.imread(file1), cv2.COLOR_BGR2RGB)) r = round(img[:, :, 0].mean()) g = round(img[:, :, 1].mean()) b = round(img[:, :, 2].mean()) items[file1] = (r, g, b,) for file in os.listdir('cats_images_dataset'): #      ,          file1 = os.path.join('cats_images_dataset', file) img = np.array(cv2.cvtColor(cv2.imread(file1), cv2.COLOR_BGR2RGB)) r = round(img[:, :, 0].mean()) g = round(img[:, :, 1].mean()) b = round(img[:, :, 2].mean()) items[file1] = (r, g, b,) with open('data.pickle', 'wb') as f: pickle.dump(items, f)

This script will be executed for a relatively long time, after which the information we need will be saved in the data.pickle file.

Mosaic creation

Finally, we turn to creating a mosaic. First, we write the necessary import s, and also declare several constants:

 import os import cv2 import pickle import numpy as np from math import sqrt PATH_TO_PICTURE = '' #       PICTURE = 'picture.png' #     VERTICAL_SECTION_SIZE = 7 #       HORIZONTAL_SECTION_SIZE = 7 #

We get the saved data from the file:

 with open('data.pickle', 'rb') as f: items = pickle.load(f)

We describe the loss function:

 def lost_function(r_segm, g_segm, b_segm, arg): r, g, b = arg[1] return sqrt((r - r_segm) ** 2 + (g - g_segm) ** 2 + (b - b_segm) ** 2)

Open the original image:

 file = os.path.join(PATH_TO_PICTURE, PICTURE) img = np.array(cv2.cvtColor(cv2.imread(file), cv2.COLOR_BGR2RGB)) size = img.shape x, y = size[0], size[1]

Now note that tiling is possible if and only if

$(x \ _orig \ space \ vdots \ space x) \ space \ wedge \ space (y \ _orig \ space \ vdots \ space y)$ where

$x \ _orig, y \ _orig$ - the size of the original image, and

$x, y$ - the size of the paving area. Of course, the above condition is not always satisfied. Therefore, we will cut the original image to the appropriate size, subtracting from the image size their residues from division into area sizes:

 img = cv2.resize(img, (y - (y % VERTICAL_SECTION_SIZE), x - (x % HORIZONTAL_SECTION_SIZE))) size = img.shape x, y = size[0], size[1]

We now proceed directly to the tilting:

 for i in range(x // HORIZONT AL_SECTION_SIZE): for j in range(y // VERTICAL_SECTION_SIZE): sect = img[i * HORIZONTAL_SECTION_SIZE:(i + 1) * HORIZONTAL_SECTION_SIZE, j * VERTICAL_SECTION_SIZE:(j + 1) * VERTICAL_SECTION_SIZE] r_mean, g_mean, b_mean = sect[:, :, 0].mean(), sect[:, :, 1].mean(), sect[:, :, 2].mean()

Here, in the last but one line, the desired area of the picture is selected, and in the last line its average RGB components are considered.

Now consider one of the most important lines:

 current = sorted(items.items(), key=lambda argument: lost_function(r_mean, g_mean, b_mean, argument))[0]

This line sorts all dataset images in ascending order by the value of the loss function for them and takes out argmin.

Now we just have to crop the image and replace the area with it:

 resized = cv2.resize(cv2.cvtColor(cv2.imread(current[0]), cv2.COLOR_BGR2RGB), (VERTICAL_SECTION_SIZE, HORIZONTAL_SECTION_SIZE,)) img[i * HORIZONTAL_SECTION_SIZE:(i + 1) * HORIZONTAL_SECTION_SIZE, j * VERTICAL_SECTION_SIZE:(j + 1) * VERTICAL_SECTION_SIZE] = resized

Well, finally we will display the resulting image on the screen:

 img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR) cv2.imshow('ImageWindow', img) cv2.waitKey(0)

A little more about the loss function

In general, there are several variants of the loss function, each of which is theoretically applicable to this problem. Their quality can be assessed only by experience, what you can do :)

$| \ Delta R | + | \ Delta G | + | \ Delta B | \\ sqrt {Delta R ^ 2 + Delta G ^ 2 + \ Delta B ^ 2} \\ sqrt {0.2126 \ Delta R ^ 2 + 0.7152 \ Delta G ^ 2 + 0.0722 \ Delta B ^ 2} \ \ \ sqrt {0.2126 ^ 2 \ Delta R ^ 2 + 0.7152 ^ 2 \ Delta G ^ 2 + 0.0722 ^ 2 \ Delta B ^ 2}$