How we saved eyes with opencv

This post has been delayed for 4 months. We are a young development team, and we are just learning to break dead lines, but it seems that we’ve got a good turn. The prehistory in this article , where we promised to lay out a sequel. The story will go about how our application works (or does not work, it's up to the reader).

Which app? We are a Viewaide project team (formerly EyeDoc) and are writing software that, using a webcam, determines eye fatigue parameters and displays notifications, the task of which is to reduce the risk of vision deterioration due to long work at the monitor. Than 100 times to hear, it is better to see 1 time.

')
You can download and try this link , as they say, “for free, without SMS”. In addition to the software, we also have a part of the web-service, but about everything in order.

How painful it is for us to realize that the monitor is harmful to the eyes is described in the previous article , but briefly I want to say that we started doing such things not for profit (although someone will refuse it), but for the sake of solving our own problem. A year ago, when we were just starting work on the project, the problem was acute: one co-founder struggled to catch a minibus, and the second was actively approaching the sight of the first. Not to be unfounded, a few numbers.

Computer visual syndrome (GLC) is a temporary condition resulting from prolonged continuous focusing of the eyes on the display.

According to some sources , about 75% of people who work at the computer for more than 3 hours a day are exposed to the visual symptoms of this syndrome.

GLC occurs as a result of eye fatigue, resulting in headaches, blurred vision, redness in the eyes, tension, fatigue, dry eyes, and as a result - loss of visual acuity. It may also be a cause for the development of more serious eye diseases. What can we say, if about 40 million Americans suffer from dry eye syndrome. Dry eye symptoms are the number one reason for visiting an ophthalmologist in the United States.

Having studied in more detail the material on this topic in the network, after talking with ophthalmologists, we learned that eye fatigue can be determined by some parameters:

squinting
blink rate
average distance between monitor and user, etc.

These are the factors by which one can determine whether a person’s eyes are tired. And subsequently, the very ignoring of these little things may further aggravate the situation.

Frankly, the idea came not after a meticulous market research and analysis of current IT trends, but after we thought: “Why would you code such a code more abruptly?” A year ago, Image Processing seemed cool. Then we thought about it: “The webcam is constantly in front of the nose, is it really necessary only to talk in Skype?” We wanted to create not only something cool, but also useful. So the idea appeared that if the eyes are always in the camera's view, then you only need to properly process their image, taking into account various factors (eye fatigue parameters are given above), and maybe something will work out.

The choice of development tools fell on Qt + OpenCV . Qt liked not only because it was a beautiful green color, with a convenient code editor and syntax highlighting, but also because it gave a chance to create cross-platform applications for people with C ++ hardening. Now there is a hope that it will be possible to write full-fledged mobile applications . Wait and see. If Qt was responsible for the GUI, then something else was needed to work with the images. OpenCV is a library written by computer vision adepts from Intel. It has very detailed documentation and many examples on the web. Oh yeah, all these development tools are free, including for commercial use, which is a big plus.

Task №1 Get images of the eyes using a web-camera.

If you can perform any manipulations with the image, you must first get it. And in this case it is enough to get only an image of the eyes. We'll get the streaming video from the webcam; there are plenty of examples on the net.

The search for objects in an image in OpenCV is implemented by the Viola-Jones method . The theme of pattern recognition is not biased towards frequent innovations, the method was introduced in 2001, and 13 years later it is still leading in its field. If in short, the method substitutes the so-called Haar primitives, which are a set of elementary combinations of dark and light areas, and if an area is found in the image in which a sufficient number of primitives fits, then the object is found. In order to understand whether this area is bright or dark, it is necessary to sum up the value of the neighboring pixels. In order not to do this many times in the process of searching for an object, the image is translated into an integral representation . Despite all the optimization of the speed of the method, searching for a person on video in real-time on a budget PC cannot be achieved. Or is it still possible?

So, the task is to find the user's eyes with a webcam. Take the image resolution of 640x480.

vHaarDetectObjects(frame,left_eye_cascade,storage,1.1,3,CV_HAAR_DO_CANNY_PRUNING,cvSize(22,22);

The purpose of this function is to find the left eye. If you solve the problem in the forehead, then the speed will be just awful, and the processor load is too big for the background application. In addition, there will be a lot of unnecessary positives. Therefore, it was decided - you need to start with a face search.

 vHaarDetectObjects(frame,face_cascade,storage,1.1,3,CV_HAAR_DO_CANNY_PRUNING,cvSize(80,80);

Already works more brightly. What has changed? The last argument of the function determines the minimum size of the desired object. In numerous examples, OpenCV states that for eyes it is necessary to look for an area of at least 22x22 pixels, and for a face it is possible to have 80x80. Here it becomes clear why the search for the face is faster: to go over the entire image with a 22x22 area is far from as fast as 80x80. For comparison, on my Intel Core 2 Duo 2.2 GHz machine, RAM 2 Gb, the eye search function takes an average of 900 milliseconds, and the face search takes 200 milliseconds.

As you know, the face is not square (unless our user is Sponge Bob). Let's make the minimum search area rectangular. With these figures, the work time is already 160 milliseconds.

 vHaarDetectObjects(frame,face_cascade,storage,1.1,3,CV_HAAR_DO_CANNY_PRUNING,cvSize(80,120);

What is the 3rd argument, equal to 1.1? This is the step with which the “search window” is expanded. That is, if there were no faces with dimensions 80x120, then the size is multiplied by 1.1. And what if you increase not by 10%, but by 20%? Not bad, the running time is already 100 milliseconds.

 vHaarDetectObjects(frame,face_cascade,storage,1.2,3,CV_HAAR_DO_CANNY_PRUNING,cvSize(80,120);

And one more thought. Why do we need the image 640x480? If you reduce it by 2 times (now 320x240), then you can quickly search for the desired object. Having reduced the image, we see that the face can be seen as well. It is logical that the speed increased by 2 times, the operation time - 50 milliseconds.

 vHaarDetectObjects(frame,face_cascade,storage,1.2,3,CV_HAAR_DO_CANNY_PRUNING,cvSize(40,60);

The face search works more or less tolerably, but we still need eyes. There is no point in looking for them all over your face. It is better to identify areas where they can, in principle, be located. The idea is easy to illustrate.

Already inside these areas we will look for the eyes. Agree that the amount of work has become much smaller compared with the amount of work when searching for the entire image. In addition, the probability of a false positive is already low.

Do we need to look for a face in every frame? Hardly. The main goal is to find the eyes. In order not to complicate the life of the processor, complicate it yourself. When the eye is found, take the rectangle in which it is located and increase the area by 2 times. Then we apply this area to the next frame of video from the webcam, and voila! With high enough fps (and we just tried to speed up the processing of each frame) in this area it is very likely to find our eye.

The screenshot shows the main stages of an optimized eye search. If the eyes are not found, then after several attempts, you need to return to the face search. Now the search function, with all its components, takes an average of 30 milliseconds of processor time, which is 30 times less than the head-on approach.

Task №2 Determination of the distance to the monitor.

To solve this problem, this article has helped us prohibitively. In this example, the author gives us a fairly simple geometry, demonstrating the work of a webcam:

In this case, we see that the triangles ABC and ECD are similar. We will hold two heights from point C and remove all unnecessary information.

Now, looking at the geometric version of the webcam, we understand that the ratio of the sides AB / CF is equal to the ratio ED / CH. Let's calculate this ratio. This is done very simply. Place a line of 10 cm in front of the webcam so that it fits in frame from edge to edge. After that, measure the distance from the camera to the ruler. Our result will be equal to 16 cm. The ratio of the distance to the length of the ruler, respectively, 16/10 = 1.6.

If the face area completely enters the frame, we will get a distance of 24 cm (1.6 * 15). Why 15 cm? Because the average length of a person's face is just that. The error due to the average value is almost imperceptible (only if you, again, not Sponge Bob). Now it's up to you. It is necessary to determine the distance if the person occupies less than 100% in the frame. To do this, it is enough to determine the number of centimeters in one percent. And this is 24/100 = 0.24. Thus, we calculate that the distance to the monitor = 24 + (100% is the percentage of space occupied by the person in the frame) * 0.24.

In fact, we take not the face area as the reference point, but the distance between the eyes. This is due to the fact that the search for the eyes is carried out at each iteration of the application, and the search for the face - only when necessary. And in general, It`s all about eyes.

Task number 3 Definition of squinting / blinking.

In general, squinting is a very subjective parameter. And one would not have thought about it if it were not for the remarks of the parents in childhood “do not squint, username”. Nevertheless, it is necessary to tell about its implementation, since the recognition of blinks follows from this.

Question one: what distinguishes a fully open eye from a half-closed or closed? The distance between the centuries. So, it is necessary to look for him. On the image, the eyelids are represented by a thin border of color transition. Then the solution is to find the border in the image.

In between times, I would like to advise a very interesting book Marr D. - Sight. Informational approach to the study of the presentation and processing of visual images . It describes quite an interesting approach to human vision in accordance with the theory of information. If you adhere to the theory described in the above literature, then even such wonders of sight, like stereoscopy, begin with the search for boundaries in the image. The most popular border detector is Kenny's border detector . Whether the image was very small, or the hands were crooked, or the moon was not in that phase, but the results were inaccurate, so I had to go a little deeper into the details.

We start the classic search for boundaries with the use of a Gaussian filter that helps to get rid of noise. In fact, it turns out that the images of the eyes are so small and pixelated that they do not need additional blurring. The next step is a face search. Faces are nothing but sharp color transitions. For convenience, you can convert the image to monochrome. Anyway, the borders of the eyelids represent a contrasting transition, and the black and white image reflects this perfectly, only it is easier to process it.

To find a sharp brightness transition, we use the differential operator (the Laplace operator is best suited for this operation. The Sobel operator is also used). Without going into details, let's think about why the differential? If the edges are a sharp color transition, then the difference between the sum of the values of neighboring pixels on the border and near it should be greater than in areas of the same color. If we replace “the sum of the pixel values” with the word “function”, then it is clear that we are interested in the places of its sharp change. Just for such situations comrades Leibniz and Newton came up with differential calculus.

 vLaplace(eye, dst, 9);

OpenCV has a ready implementation of this operator with a simple application. Therefore, it is enough to indicate the original image, the address for the resultant, as well as the dimension of the square matrix of pixels that the operator will “walk” in the image, performing a differential rite.

To remove unnecessary areas in the selection area, and generally make your life easier, you can binarize the image (in other words, leave only 2 colors: black and white, no shades). To do this, you can use the threshold conversion. To begin with, we take the average color of the pixels, and then convert everything below this value to black, which is higher — to white.

 cvThreshold(dst,dst,threshold,255,CV_THRESH_BINARY);

Here the threshold is our average. Next, you need to perform several cosmetic improvements, such as: clustering white pixels (faces) in order to highlight the largest clusters and remove false ones; sorting clusters by area (but this is not so important, besides, I don’t want to bore the reader or throw code at it, especially if he has read this far, which is already nice). We illustrate the process described above step by step.

It remains only to count the number of pixels between the upper boundary and the lower (upper and lower eyelids) and compare with the normal state, which is preserved during calibration.

Oh yeah, how to implement the recognition of blinking? Everything is simple, a closed eye is characterized by the fact that its upper eyelid is below the midline of the eye, this is the condition that is checked.

Task number 4 - Determination of the level of illumination.

The level of illumination also plays an important role if you are worried about your vision. The work area can be considered not very well organized if you work in a dark room, and the bright display prohibits your eyes beyond limit. It is also better not to sit in front of the computer if another bright source of light shines into the eyes behind the screen. Still do not forget about the glare that the monitor can create from a light source, located behind you. It is hardly possible to completely eliminate such a problem with a webcam. But we still tried to implement a partial solution.

So what do you need to do? First we need to divide the image into 2 parts: the face and everything else, that is, the background. We will compare the brightness of these two pictures. You can easily convert the image to monochrome, because we do not care what saturation to consider, the total or divided into three channels (RGB). Our next step is to build histograms from two images. This procedure is necessary for us to subsequently have an idea of the overall level of brightness of the image.

As a result, the horizontal axis represents the brightness, and the vertical axis shows the relative number of pixels with a specific brightness value. Next, we conditionally divide the pixels into light and dark. From each image there are 4 groups. Now, with a certain analysis and comparison between these 4 groups, we can draw several conclusions: when all the number of dark pixels significantly predominates both on the face and on the background, then we work in a dark room, and only the monitor emits light. When the number of bright pixels on the face is much more than on the background, it means that behind the monitor there is some other source of light that shines directly into our faces. It can be either a desk lamp or the sun outside the window. An example of image splitting into face and background, followed by the construction of histograms is presented below.

In this case, the user is in a dark room, and the desk lamp shines directly into the face, which is not right. It is quite logical to assume that here we cannot take into account the difference between the amount of light, color and its saturation. We tried to do this in several ways. However, histograms were the most effective way to predict the level of illumination.

It is impossible not to say how entertaining the process of testing all this joy is. And how sad it is, when everything works at home “with a bang”, and when you demonstrate the application to your friends, it will determine the eye in the nose, or else where it is worse. Algorithms definitely need constant optimization. But if after a month of work, the results scared us with our inaccuracy, then a year later we ourselves began to use our application. It still has a lot of unexpected bugs, but from your own experience you can say that, in general, everything works tolerably. Sometimes you start to squint because of the unusual brightness, sometimes, immersed in the work, you approach the monitor and really do not notice it. This is where Viewaide tells you to be inattentive. Now is the time to share the result of our work with the IT-community, with those who more often think about a possible deterioration of vision or a decrease in efficiency.

A full Windows version is currently available, as well as a Beta version on Mac. Within a month, we plan to release a ready-made application on Mac and Linux. And if the community is interested, then 2 articles will be published on the subtleties of porting Qt applications from Windows to Mac and Linux.

We would appreciate any criticism. And if you have not downloaded Viewaide yet, then it's time to do it .

Source: https://habr.com/ru/post/215771/

All Articles

How we saved eyes with opencv

Task №1 Get images of the eyes using a web-camera.

Task №2 Determination of the distance to the monitor.

Task number 3 Definition of squinting / blinking.

Task number 4 - Determination of the level of illumination.

More articles: