Face Recognition. Create and try on masks

While the community of iOS developers is arguing how to write projects, while trying to decide whether to use MVVM or VIPER, while trying to subproject a project or add a jet turbine there, I will try to break away from this and consider how another technology works from the hood under the hood Driven-Development .

In 2017, machine learning is at the top of the HYIP chart. And it is clear why:

There are more open data sets.
Appeared the appropriate hardware. Including cloud solutions.
Technologies from this area began to be used in production-projects.

Machine learning is a broad topic, focusing on face recognition and trying to figure out what technologies were before Christmas. ~~Christ's~~ CoreML, and what appeared after the release of the Apple framework.

Face recognition theory

The task of face recognition is part of the practical application of pattern recognition theory. It consists of two subtasks: identification and classification ( here the differences are detailed ). Personal identification is actively used in modern services such as Facebook, iPhoto. Face recognition is used everywhere, ranging from FaceID to the iPhone X, ending with the use of hovering in military technology.

A person recognizes the faces of other people due to the area of the brain at the border of the occipital and temporal lobes - the fusiform gyrus. We recognize different people from 4 months. The key features that the brain identifies are eye, nose, mouth, and eyebrows. Also, the human brain restores the entire face even in half and can identify a person only by part of the face. The brain averages all seen faces, and then finds differences from this average variant. Therefore, it seems to people of the Caucasian race that everyone who belongs to the Mongoloid race is like one person. It’s hard for Mongoloids to distinguish Europeans. Internal recognition is tuned to the spectral range of faces in the head, therefore, if some part of the spectrum lacks data, the face is considered to be the same.

Face recognition tasks have been solved for over 40 years. They include:

Search and recognition of several persons in the video stream.
Resistance to changes in the face, hair, beard, glasses, age and turn of the face.
Scalable data to identify a person.
Work in real time.

One of the optimal algorithms for finding a face in a picture and its selection is a histogram of directional gradients .
There are other algorithms. It describes in detail how the search for a zone with a face follows the Viola-Jones algorithm . It is less accurate and works worse with turns of the face.

A brief excursion into technology and pattern recognition solutions

There are many solutions that include algorithms for pattern recognition. The list of popular libraries that are used in iOS:

Figure 1. DLIB library structure

DLIB

Pros:
- Open source solution, you can participate in the development and watch current trends.
- Written in C ++. It has support for iOS in the form of cocoapods: pod 'dlib'.
- Can also be integrated as a C ++ library. Works on Windows, Linux, MacOS. You can work in swift applications by writing a wrapper on objective-c ++.
Minuses:
- Large size of the connected library. 40 megabytes in the form of pod.
- High entry threshold. A large number of internal algorithms, each of which will have to write a wrapper on Objective-C.

Figure 2. The structure of the library OpenCV

OpenCV (Open Source Computer Vision Library)

Pros:
- The largest community that regularly participates in support.
- Written in C ++. It has support for iOS in the form of cocoapods: pod 'OpenCV'.
Minuses:
- High entry threshold.
- Large size of the connected library. 77 megabytes in the form of pod, 180 megabytes in the form of a C ++ library.

Figure 3. CoreML structure

iOS Vision Framework

Pros:
- Easy integration into the app.
- Contains a handy converter that supports several different models of other frameworks (Keras, Caffe, scikit-learn).
- Boxed solution with a small size.
- Powered by GPU.
Minuses:
- It is part of CoreML, therefore it supports a limited number of model types of other existing frameworks.
- No support for TensorFlow, one of the most popular machine learning solutions. You have to spend a lot of time on self-made converters.
- It is a high-level abstraction. All implementation is closed, hence the impossibility of control.
- iOS 11+.

There are paid platforms that provide solutions for the problem of pattern recognition. Most develop their own algorithms and technologies. Of course, these technologies are being actively developed and used by the military, so some solutions are classified and do not have open source.

What is landmarks

Figure 4. Visual display of facial structures.

The purpose of landmarks is to find face points. The first step in the algorithm is to determine the location of the face in the picture. After receiving the location of the person looking for key contours :

The contour of the face.
Left eye.
Right eye.
Left eyebrow.
Right eyebrow.
Left pupil.
Right pupil.
Nose.
Lips.

Each of these contours is an array of points in the plane.

pic 5. dlib 68 landmarks

In the picture you can clearly see the structure of the face. However, depending on the library chosen, the number of landmarks is different. Developed solutions for 4 landmarks, 16, 64, 124 and more.

Delaunay triangulation to build a mask

Let's move on to the practical part. Let's try to build a simple mask on the face of the landmarks obtained. The expected result will be a view mask:

Figure 6. Mask visualizing the Delaunay triangulation algorithm

Delaunay triangulation - triangulation for a set of points S on a plane, in which for any triangle all points from S except for the points that are its vertices lie outside the circle described around the triangle. First described in 1934 by the Soviet mathematician Boris Delone.

Figure 7. An example of the Delaunay triangulation. A circle is generated from each point, passing through the two nearest ones in the Euclidean metric

Practical implementation of the algorithm

We implement the Delaunay triangulation algorithm for our face in the camera.

Step 1. Inside you will see a wrapper that takes an array of points in two-dimensional space and returns an array of triangles.

public final class Triangle { public var vertex1: Vertex public var vertex2: Vertex public var vertex3: Vertex public init(vertex1: Vertex, vertex2: Vertex, vertex3: Vertex) { self.vertex1 = vertex1 self.vertex2 = vertex2 self.vertex3 = vertex3 } }

And vertex is a wrapper for CGPoint, additionally containing the number of a specific landmark.

 public final class Vertex { public let point: CGPoint //  .  0  67.  68   dlib.  65  vision public let identifier: Int public init(point: CGPoint, id: Int) { self.point = point self.identifier = id } }

Step 2. We proceed to the drawing of polygons on the face. Turn on the camera and show the image from the camera on the screen:

 final class ViewController: UIViewController { private var session: AVCaptureSession? private let faceDetection = VNDetectFaceRectanglesRequest() private let faceLandmarks = VNDetectFaceLandmarksRequest() private let faceLandmarksDetectionRequest = VNSequenceRequestHandler() private let faceDetectionRequest = VNSequenceRequestHandler() private lazy var previewLayer: AVCaptureVideoPreviewLayer? = { guard let session = self.session else { return nil } var previewLayer = AVCaptureVideoPreviewLayer(session: session) previewLayer.videoGravity = .resizeAspectFill return previewLayer }() private lazy var triangleView: TriangleView = { TriangleView(frame: view.bounds) }() private var frontCamera: AVCaptureDevice? = { AVCaptureDevice.default(AVCaptureDevice.DeviceType.builtInWideAngleCamera, for: AVMediaType.video, position: .front) }() override func viewDidLoad() { super.viewDidLoad() sessionPrepare() session?.startRunning() guard let previewLayer = previewLayer else { return } view.layer.addSublayer(previewLayer) view.insertSubview(triangleView, at: Int.max) } override func viewDidLayoutSubviews() { super.viewDidLayoutSubviews() previewLayer?.frame = view.frame } private func sessionPrepare() { session = AVCaptureSession() guard let session = session, let captureDevice = frontCamera else { return } do { let deviceInput = try AVCaptureDeviceInput(device: captureDevice) session.beginConfiguration() if session.canAddInput(deviceInput) { session.addInput(deviceInput) } let output = AVCaptureVideoDataOutput() output.videoSettings = [ String(kCVPixelBufferPixelFormatTypeKey): Int(kCVPixelFormatType_420YpCbCr8BiPlanarFullRange) ] output.alwaysDiscardsLateVideoFrames = true if session.canAddOutput(output) { session.addOutput(output) } session.commitConfiguration() let queue = DispatchQueue(label: "output.queue") output.setSampleBufferDelegate(self, queue: queue) print("setup delegate") } catch { print("can't setup session") } } }

Step 3. Next we get frames from the camera

Figure 8. An example of the received frame from the camera

 extension ViewController: AVCaptureVideoDataOutputSampleBufferDelegate { func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) { guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return } guard let attachments = CMCopyDictionaryOfAttachments(kCFAllocatorDefault, sampleBuffer, kCMAttachmentMode_ShouldPropagate) as? [String: Any] else { return } let ciImage = CIImage(cvImageBuffer: pixelBuffer, options: attachments) // leftMirrored for front camera let ciImageWithOrientation = ciImage.oriented(forExifOrientation: Int32(UIImageOrientation.leftMirrored.rawValue)) detectFace(on: ciImageWithOrientation) } }

Step 4. Looking for faces on the frame

  fileprivate func detectFace(on image: CIImage) { try? faceDetectionRequest.perform([faceDetection], on: image) if let results = faceDetection.results as? [VNFaceObservation] { if !results.isEmpty { faceLandmarks.inputFaceObservations = results detectLandmarks(on: image) } } }

Step 5. Looking for landmarks on the face

Figure 9. Example of landmarks found on the face.

  private func detectLandmarks(on image: CIImage) { try? faceLandmarksDetectionRequest.perform([faceLandmarks], on: image) guard let landmarksResults = faceLandmarks.results as? [VNFaceObservation] else { return } for observation in landmarksResults { if let boundingBox = faceLandmarks.inputFaceObservations?.first?.boundingBox { let faceBoundingBox = boundingBox.scaled(to: UIScreen.main.bounds.size) var maparr = [Vertex]() for (index, element) in convertPointsForFace(observation.landmarks?.allPoints, faceBoundingBox).enumerated() { let point = CGPoint(x: (Double(UIScreen.main.bounds.size.width - element.point.x)), y: (Double(UIScreen.main.bounds.size.height - element.point.y))) maparr.append(Vertex(point: point, id: index)) } triangleView.recalculate(vertexes: maparr) } } } private func convertPointsForFace(_ landmark: VNFaceLandmarkRegion2D?, _ boundingBox: CGRect) -> [Vertex] { guard let points = landmark?.normalizedPoints else { return [] } let faceLandmarkPoints = points.map { (point: CGPoint) -> Vertex in let pointX = point.x * boundingBox.width + boundingBox.origin.x let pointY = point.y * boundingBox.height + boundingBox.origin.y return Vertex(point: CGPoint(x: Double(pointX), y: Double(pointY)), id: 0) } return faceLandmarkPoints }

Step 6. Next, draw on top of our mask. We take the obtained triangles from the Delone algorithm and draw in the form of layers.

Figure 10. The final result is the simplest mask over the face.

Full implementation of the Delaunay triangulation algorithm on Swift here .

And a couple of optimization tips for the sophisticated. Drawing new layers every time is an expensive operation. Constantly calculate the coordinates of the triangles using the Delaunay algorithm is also expensive. Therefore, we take a face in high resolution and good quality, which looks into the camera, and we run once the Delaunay triangulation algorithm in this photo. The resulting triangles are saved in a text file, and then we use these triangles and change their coordinates.

What are masks

MSQRD, Snapchat, VK, even Avito - everyone uses masks.

rice 11. Examples of snapchat masks

To implement the simplest version of the mask is easy . Take the landmarks that got higher. Select the mask you want to apply and place our landmarks on it. At the same time, there are simplest 2D projections, and there are more complex 3D masks. For them, calculate the conversion points, which will translate the vertices of the mask on the frame. To landmarks, responsible for the ears, were responsible for the ears of our mask. Next, just keep track of the new position landmarks of the face and change our mask.

In this area there are difficult tasks that are solved when creating masks. For example, the complexity of rendering. Even more difficult are the moments of landmarks jumps, as in this case masks are distorted and will behave unpredictably. And since the capture of frames from a mobile phone camera is a chaotic process involving a rapid change of light, shadows, sharp jerks, and so on, the task becomes very time consuming. Another challenge is building complex masks.
As entertainment or solving a simple problem, this is interesting. But as in other areas, if you want to solve cool problems, you will have to spend time learning.

In the next article

The task of recognizing images, faces, car numbers, gender, age is becoming increasingly popular. IT companies in this market introduce technologies to solve such problems gradually and imperceptibly to the user. China will invest 150 billion in machine learning in the coming years to be the first in this area.

In the next article I will tell you how to identify a specific person by the selected person and filter fuzzy photos before identification.

Source: https://habr.com/ru/post/343514/

All Articles