📜 ⬆️ ⬇️

Face segmentation on a selfie without neural networks

Greetings, colleagues. It turns out that not all computer vision today is done using neural networks. Although many startups claim that they have deep lending everywhere, I’m in a hurry to disappoint you, they just want to get a little bit better. Consider, for example, the segmentation task. A whole drama unfolded in our slak . One rich and high-tech selfie company gathered datasets for segmentation of self using neural networks (and this is not an easy and expensive job). And the other, poorer and not very developed, decided that it was possible to bribe people marking up photos, and cn get the base. In general, the passion in these of your Internet still those. Recently, I came across an article where without any neural networks on the device they make very good segmentation. For the segmentation, the user is required to give the algorithm a few hints, but with the help of dlib and opencv, such hints are easily automated. As a bonus, we will also smooth out the cut face and transfer it to some random person, thereby understanding how the masks work in all these snapshots and masquerades. In general, the classics are still alive, and if you want to dive a little into the classic computer vision on python, then welcome under cat.


Algorithm


We briefly describe the algorithm, and then proceed to implement it step by step. Suppose we have some image, we ask the user to draw two curves on the image. The first (blue color) must fully belong to the object of interest. The second (green) should only touch the background of the image.



Next, do the following steps:



Further material will be diluted with python code inserts, if you plan to do it as you read the post, then you will need the following imports:


import
%matplotlib inline import matplotlib import numpy as np import matplotlib.pyplot as plt import seaborn as sns sns.set_style("dark") plt.rcParams['figure.figsize'] = 16, 12 import pandas as pd from PIL import Image from tqdm import tqdm_notebook from skimage import transform import itertools as it from sklearn.neighbors.kde import KernelDensity import matplotlib.cm as cm import queue from skimage import morphology import dlib import cv2 from imutils import face_utils from scipy.spatial import Delaunay 

We automate strokes


The idea of ​​how to automate strokes was inspired by the FaceApp application, which supposedly uses neural networks for transformation. It seems to me that if they use the network somewhere, then only in the detection of specific points on the face . Take a look at the screenshot on the right, they suggest aligning your face with the contour. Probably, the detection algorithm was trained at about this scale. As soon as the face enters the contour, the contour frame itself disappears, which means that the singular points have been calculated. Let me introduce you to today's test subject, as well as to remind you what these very special points on your face are.



 img_input = np.array(Image.open('./../data/input2.jpg'))[:500, 400:, :] print(img_input.shape) plt.imshow(img_input) 


Now let us take advantage of the free open source software and find a frame around the face and special points on the face, there are 68 of them.


 #      () detector = dlib.get_frontal_face_detector() #       predictor = dlib.shape_predictor('./../data/shape_predictor_68_face_landmarks.dat') #       img_gray = cv2.cvtColor(img_input, cv2.COLOR_BGR2GRAY) #        rects = detector(img_gray, 0) #    shape = predictor(img_gray, rects[0]) shape = face_utils.shape_to_np(shape) 

draw key points
 img_tmp = img_input.copy() for x, y in shape: cv2.circle(img_tmp, (x, y), 1, (0, 0, 255), -1) plt.imshow(img_tmp) 


The original frame on the face is too small (green), we need a frame that completely contains the face with a certain gap (red). The coefficients of expansion of the frame are obtained empirically by analyzing several dozen selfies of different scale and different people.


 #   face_origin = sorted([(t.width()*t.height(), (t.left(), t.top(), t.width(), t.height())) for t in rects], key=lambda t: t[0], reverse=True)[0][1] #    rescale = (1.3, 2.2, 1.3, 1.3) #  ,        (x, y, w, h) = face_origin cx = x + w/2 cy = y + h/2 w = min(img_input.shape[1] - x, int(w/2 + rescale[2]*w/2)) h = min(img_input.shape[0] - y, int(h/2 + rescale[3]*h/2)) fx = max(0, int(x + w/2*(1 - rescale[0]))) fy = max(0, int(y + h/2*(1 - rescale[1]))) fw = min(img_input.shape[1] - fx, int(w - w/2*(1 - rescale[0]))) fh = min(img_input.shape[0] - fy, int(h - h/2*(1 - rescale[1]))) face = (fx, fy, fw, fh) 

draw frames
 img_tmp = cv2.rectangle(img_input.copy(), (face[0], face[1]), (face[0] + face[2], face[1] + face[3]), (255, 0, 0), thickness=3, lineType=8, shift=0) img_tmp = cv2.rectangle(img_tmp, (face_origin[0], face_origin[1]), (face_origin[0] + face_origin[2], face_origin[1] + face_origin[3]), (0, 255, 0), thickness=3, lineType=8, shift=0) plt.imshow(img_tmp) 


Now we have an area that doesn’t exactly refer to a face - everything outside the red box. Select from there a number of random points and we will consider them as background strokes. We also have 68 points that are precisely located on the face. To simplify the task, I will choose 5 of them: one at eye level at the edge of the face, one at the mouth level at the edge of the face and one at the bottom in the middle of the chin. All points inside this pentagon will belong only to the face. Again, for simplicity, we will assume that the face is vertically located on the image and therefore we can reflect the resulting pentagon along the axis y, thereby obtaining an octagon. Everything inside the octagon will be considered the stroke of the object.


 #     points = [shape[0].tolist(), shape[16].tolist()] for ix in [4, 12, 8]: x, y = shape[ix].tolist() points.append((x, y)) points.append((x, points[0][1] + points[0][1] - y)) #         #     , #       ,     #     , ..     #   :good-enough: hull = Delaunay(points) xy_fg = [] for x, y in it.product(range(img_input.shape[0]), range(img_input.shape[1])): if hull.find_simplex([y, x]) >= 0: xy_fg.append((x, y)) print('xy_fg%:', len(xy_fg)/np.prod(img_input.shape)) #      #      ,    r = face[1]*face[3]/np.prod(img_input.shape[:2]) print(r) k = 0.1 xy_bg_n = int(k*np.prod(img_input.shape[:2])) print(xy_bg_n) #    xy_bg = zip(np.random.uniform(0, img_input.shape[0], size=xy_bg_n).astype(np.int), np.random.uniform(0, img_input.shape[1], size=xy_bg_n).astype(np.int)) xy_bg = list(xy_bg) xy_bg = [(x, y) for (x, y) in xy_bg if y < face[0] or y > face[0] + face[2] or x < face[1] or x > face[1] + face[3]] print(len(xy_bg)/np.prod(img_input.shape[:2])) 

draw strokes
 img_tmp = img_input/255 for x, y in xy_fg: img_tmp[x, y, :] = img_tmp[x, y, :]*0.5 + np.array([1, 0, 0]) * 0.5 for x, y in xy_bg: img_tmp[x, y, :] = img_tmp[x, y, :]*0.5 + np.array([0, 0, 1]) * 0.5 plt.imshow(img_tmp) 


Fuzzy separation of background and object


Now we have two data sets: object points Dfand background points Db.


 points_fg = np.array([img_input[x, y, :] for (x, y) in xy_fg]) points_bg = np.array([img_input[x, y, :] for (x, y) in xy_bg]) 

Let's look at the distribution of colors on the RGB channels in each of the sets. The first histogram is for the object, the second is for the background.


drawing distributions
 fig, axes = plt.subplots(nrows=2, ncols=1) sns.distplot(points_fg[:, 0], ax=axes[0], color='r') sns.distplot(points_fg[:, 1], ax=axes[0], color='g') sns.distplot(points_fg[:, 2], ax=axes[0], color='b') sns.distplot(points_bg[:, 0], ax=axes[1], color='r') sns.distplot(points_bg[:, 1], ax=axes[1], color='g') sns.distplot(points_bg[:, 2], ax=axes[1], color='b') 



I am glad that the distributions are different. This means that if we can get functions that estimate the probability that a point belongs to the desired distribution, then we will get fuzzy masks. And there is such a way - kernel density estimation . For a given set of points, you can build a density estimate function for a new point. xas follows (for simplicity, an example for a one-dimensional distribution):


F left(x right)= frac1h cdot left|D right| sumi=1 left|D right|K left( fracxxih right)


Where:



For simplicity, we will use the Gaussian core:


K(u)= frac1 sqrt2 pie frac12u2


Although the speed of the Gaussian core is not the best choice, and if you take the Eepanechnikov core , then everything will be considered faster. I will also use KerleDensity from sklearn , which will eventually result in 5 minutes of scoring. The authors of this article argue that replacing KDE with an optimal implementation reduces the calculations on the device to one second.


 #   KDE     kde_fg = KernelDensity(kernel='gaussian', bandwidth=1, algorithm='kd_tree', leaf_size=100).fit(points_fg) kde_bg = KernelDensity(kernel='gaussian', bandwidth=1, algorithm='kd_tree', leaf_size=100).fit(points_bg) #     score_kde_fg = np.zeros(img_input.shape[:2]) score_kde_bg = np.zeros(img_input.shape[:2]) likelihood_fg = np.zeros(img_input.shape[:2]) coodinates = it.product(range(score_kde_fg.shape[0]), range(score_kde_fg.shape[1])) for x, y in tqdm_notebook(coodinates, total=np.prod(score_kde_fg.shape)): score_kde_fg[x, y] = np.exp(kde_fg.score(img_input[x, y, :].reshape(1, -1))) score_kde_bg[x, y] = np.exp(kde_bg.score(img_input[x, y, :].reshape(1, -1))) n = score_kde_fg[x, y] + score_kde_bg[x, y] if n == 0: n = 1 likelihood_fg[x, y] = score_kde_fg[x, y]/n 

As a result, we have several masks:



Look at the following distributions.


The distribution of score_kde_fg values
 sns.distplot(score_kde_fg.flatten()) plt.show() 


The distribution of score_kde_bg
 sns.distplot(score_kde_bg.flatten()) plt.show() 


The distribution of values ​​likelihood_fg:


 sns.distplot(likelihood_fg.flatten()) plt.show() 


Instilling hope that pf left(x right)there are two peaks, and the number of points belonging to the face is clearly not less than the background points. Draw the resulting mask.


mask score_kde_fg
 plt.matshow(score_kde_fg, cmap=cm.bwr) plt.show() 


mask score_kde_bg
 plt.matshow(score_kde_bg, cmap=cm.bwr) plt.show() 


 plt.matshow(likelihood_fg, cmap=cm.bwr) plt.show() 


mask 1 - likelihood_fg
 plt.matshow(1 - likelihood_fg, cmap=cm.bwr) plt.show() 


Unfortunately, part of the door jamb turned out to be part of the face. Well, that cant far from the face. We will use this property in the next part.





Binary object mask


Imagine the image as a graph, the nodes of which are pixels, and the edges are connected to the points above and below the current point, as well as to the right and left of it. The weights of the edges will be the absolute value of the difference in the probabilities of belonging points to an object or to the background:


d left(a,b right)= left|p left(a right)p left(b right) right|


Accordingly, the closer the probabilities are to each other, the lower is the edge weight between points. We use the Dijkstra algorithm to find the shortest paths and their distances from the point to all the others. We will call the algorithm two times, applying to the input all the probabilities of belonging in the object and then the probabilities of belonging of points to the background. The concept of distance is sewn directly into the algorithm, and the distance between points belonging to one group (object or background) will be zero. As part of the Dijkstra algorithm, we can put all these points in the group of visited vertices.


 def dijkstra(start_points, w): d = np.zeros(w.shape) + np.infty v = np.zeros(w.shape, dtype=np.bool) q = queue.PriorityQueue() for x, y in start_points: d[x, y] = 0 q.put((d[x, y], (x, y))) for x, y in it.product(range(w.shape[0]), range(w.shape[1])): if np.isinf(d[x, y]): q.put((d[x, y], (x, y))) while not q.empty(): _, p = q.get() if v[p]: continue neighbourhood = [] if p[0] - 1 >= 0: neighbourhood.append((p[0] - 1, p[1])) if p[0] + 1 <= w.shape[0] - 1: neighbourhood.append((p[0] + 1, p[1])) if p[1] - 1 >= 0: neighbourhood.append((p[0], p[1] - 1)) if p[1] + 1 < w.shape[1]: neighbourhood.append((p[0], p[1] + 1)) for x, y in neighbourhood: #    d_tmp = d[p] + np.abs(w[x, y] - w[p]) if d[x, y] > d_tmp: d[x, y] = d_tmp q.put((d[x, y], (x, y))) v[p] = True return d #      d_fg = dijkstra(xy_fg, likelihood_fg) d_bg = dijkstra(xy_bg, 1 - likelihood_fg) 

new fuzzy object mask
 plt.matshow(d_fg, cmap=cm.bwr) plt.show() 


new fuzzy background mask
 plt.matshow(d_bg, cmap=cm.bwr) plt.show() 


And now we refer to the object all those points from which the distance to the object is less than the distance to the background (you can add a gap).


 margin = 0.0 mask = (d_fg < (d_bg + margin)).astype(np.uint8) plt.matshow(mask) plt.show() 


You can send yourself into space.


 img_fg = img_input/255.0 img_bg = (np.array(Image.open('./../data/background.jpg'))/255.0)[:800, :800, :] x = int(img_bg.shape[0] - img_fg.shape[0]) y = int(img_bg.shape[1]/2 - img_fg.shape[1]/2) img_bg_fg = img_bg[x:(x + img_fg.shape[0]), y:(y + img_fg.shape[1]), :] mask_3d = np.dstack([mask, mask, mask]) img_bg[x:(x + img_fg.shape[0]), y:(y + img_fg.shape[1]), :] = mask_3d*img_fg + (1 - mask_3d)*img_bg_fg plt.imshow(img_bg) 


Mask smoothing


You probably noticed that the mask is slightly torn at the edges. But this is easily corrected by the methods of mathematical morphology .



Suppose we have a structural element (FE) of the type "disk" - a binary disk mask.



We will use the opening, which will first remove the "hairiness" at the edges, and then return the original size (the object will "lose weight" after erosion).


 mask = morphology.opening(mask, morphology.disk(11)) plt.imshow(mask) 


After applying such a mask, the result will be more pleasant:


mask application code
 img_fg = img_input/255.0 img_bg = (np.array(Image.open('./../data/background.jpg'))/255.0)[:800, :800, :] x = int(img_bg.shape[0] - img_fg.shape[0]) y = int(img_bg.shape[1]/2 - img_fg.shape[1]/2) img_bg_fg = img_bg[x:(x + img_fg.shape[0]), y:(y + img_fg.shape[1]), :] mask_3d = np.dstack([mask, mask, mask]) img_bg[x:(x + img_fg.shape[0]), y:(y + img_fg.shape[1]), :] = \ mask_3d*img_fg + (1 - mask_3d)*img_bg_fg plt.imshow(img_bg) 


Mask down


Take a random picture from the Internet for the face transfer experiment.


Test copy
 img_target = np.array(Image.open('./../data/target.jpg')) img_target = (transform.rescale(img_target, scale=0.5, mode='constant')*255).astype(np.uint8) print(img_target.shape) plt.imshow(img_target) 


We find on the experimental all 68 prickly points of the face, I remind you that they will be in the same order as on any other face.


 img_gray = cv2.cvtColor(img_target, cv2.COLOR_BGR2GRAY) rects_target = detector(img_gray, 0) shape_target = predictor(img_gray, rects_target[0]) shape_target = face_utils.shape_to_np(shape_target) 

To transfer one person to another, you need to scale the first person under a new one, rotate it and move it, i.e. apply some affine transformation to the first person. It turns out that the affine transformation is not some, but quite specific. It should be such that translates 68 points of the first person to 68 points of the second person. It turns out that to obtain an affine transform operator, we need to solve the linear regression problem.


 left( beginarrayccx11,x12,1x21,x22,1 cdotsx681,x682,1 endarray right) times left( beginarraycca11,a21,a31a12,a22,a32  a13,a23,a33 endarray right)= left( beginarrayccy11,y12,1y21,y22,1 cdotsy681,y682,1 endarray right)


This equation is easily solved using a pseudoinverse matrix :


 largeX cdotA=Y RightarrowA= left(XTX right)1XTY


So do:


 #      , #      ,   X = np.hstack((shape, np.ones(shape.shape[0])[:, np.newaxis])) Y = np.hstack((shape_target, np.ones(shape_target.shape[0])[:, np.newaxis])) #   A = np.dot(np.dot(np.linalg.inv(np.dot(XT, X)), XT), Y) #       X = np.array([(y, x, 1) for (x, y) in it.product(range(mask.shape[0]), range(mask.shape[1])) if mask[x, y] == 1.0]) #        Y = np.dot(X, A).astype(np.int) 

Mask down
 img_tmp = img_target.copy() for y, x, _ in Y: if x < 0 or x >= img_target.shape[0] or y < 0 or y >= img_target.shape[1]: continue img_tmp[x, y, :] = np.array([0, 0, 0]) plt.imshow(img_tmp) 


Transfer face
 img_trans = img_target.copy().astype(np.uint8) points_face = {} for ix in range(X.shape[0]): y1, x1, _ = X[ix, :] y2, x2, _ = Y[ix, :] if x2 < 0 or x2 >= img_target.shape[0] or y2 < 0 or y2 >= img_target.shape[1]: continue points_face[(x2, y2)] = img_input[x1, y1, :] for (x, y), c in points_face.items(): img_trans[x, y, :] = c plt.imshow(img_trans) 


Conclusion


As homework, you can make the following improvements yourself:



The source notebook is here . Have a great time.


As usual ATP bauchgefuehl for editing.


')

Source: https://habr.com/ru/post/336594/


All Articles