Over time, change our understanding of how to interact with the computer. In place of the “classic” keyboards and mice, touchpads and touch screens have firmly entered our lives. But this is not the last stage of evolution for input tools. With the advent of augmented reality devices, such as Google Glass , there is a need for interfaces capable of harmoniously fitting into this concept. There are prerequisites for the emergence of such interfaces, for example, such devices as Intel Creative Camera , Microsoft Kinect or Leap Motion have appeared . The main control elements in these devices are the user's hands. Therefore, one of the fundamental algorithmic tasks for interacting with such devices is the detection of the user's hands and fingers and the reconstruction of their spatial location.
This article will discuss one of the ways to solve the problem of detecting palms and fingers.
Formulation of the problem
By detecting hands and fingers, we mean the detection of such points by which you can restore the position of the palm on the plane and its posture. As such points it is rational to use the center of mass of the palm and the points describing the tips of the fingers.
Algorithm Description
Consider some outline describing the silhouette of the palm:
Search for a special point of the palm
First, we define a point that is a palm descriptor. As mentioned above, we will use the center of mass of the contour as such a point. To find it, we need to calculate the spatial moments. The moment is a characteristic of the contour, calculated by integrating (or summing) all the pixels of the contour. In general, the moment (p, q) can be written as:
Then the formula for the coordinates of the center of mass can be written in the form:
')
The approximate placement of the center of mass is marked with a red dot on the image.
Search for points of the fingers
Now consider the parts of the contour corresponding to the fingers.
For each point P [n] of the contour, we will also consider the points P [nr], P [n + r], where r is some positive number (r <n).
Three such points form an angle. Consider them:
As can be seen from the image, on the contour corresponding to the silhouette of the fingers, there can be 2 types of points:
1) Points lying on a straight line (correspond to points of a finger). The angle P [nr] P [n] P [n +] is dull.
2) Points lying on the arcs (correspond to the tips of the fingers and the gaps between the fingers). The angle P [nr] P [n] P [n +] is dull.
We are interested in points of the second type, since they describe the tips of the fingers.
As is known from the course mat. analysis: . Therefore, as points describing finger tips, we will look for type 2 points with the maximum (in the neighborhood) cosine of the angle P [nr] P [n] P [n +].
But, as can be seen from the figure above, points 2 of the type correspond not only to the tips of the fingers, but also to the intervals between the fingers. To determine whether a point is a fingertip, we use the properties of the contour traversal. Suppose we go around the pixels of the contour clockwise, then the points corresponding to the finger tips will correspond to the right rotation P [n] P [n + r] relative to P [nr] P [n], and the points lying in the gap between the fingers to the left rotation.
To determine whether the three points P [nr], P [n], P [n + r] form a right turn, you can use a generalization of the vector product to two-dimensional space, namely, the right turn condition will look like this:
Thus we obtain the coordinates of the points corresponding to the fingertips.
Algorithm implementation
Generally speaking, the algorithm described above will work with the video stream of a regular webcam, but in this case there will be problems with a neat separation of the foreground from the background. To avoid these problems, the RGB-D sensor (Microsoft Kinect) was used, with which you can, instead of subtracting the background, simply limit the working distance by a threshold cut-off in depth. In general, Kinect for this task is not very well suited, because the minimum working distance for it is about 40 cm, and this imposes significant restrictions on its placement. But it's still better than nothing. OpenNI was used as a driver for working with Kinect.
The OpenCV library was also used to simplify working with Kinect and contour processing.
Experimental results
An example of a picture in the process of the algorithm:
An example of a video with the process of the algorithm (no tracking is used, in each frame, hands and fingers are searched for a new one):