Nexus 5 + javascript + 48 hours = touch surface?

A few weeks ago, WTH.BY hackathon took place in Minsk, in which I decided to take part. His main idea was that this is a hackathon for developers. We could do anything, as long as it was fun and interesting for us. No monetization, investments and mentors. Everything is fun and cool!

I had a lot of ideas for implementation, but they all did not reach out to some “Wow!”. That is why, on the eve of the event, I leafed through the old Habr's articles from the DIY section and came across the article " The experience of creating multitouch table ". This was what caused that absent "Wow!" And I decided to make a distant analogue of what was at hand.

At hand, I had approximately A3 glass, plain paper, a marker, a mobile phone and a laptop. I quickly found myself an accomplice of Egor and active work began.
')
. . . .

In general, it was decided to make a touch surface, the touch on which would be recognized by our system. To do this, I pulled a piece of ordinary glass, paper and a marker out of the house. We put the glass on two piles of books, put a sheet of paper on it with scotch tape, and put the phone on the bottom with the front camera facing up. The camera takes the image from below, recognizes the image of the place of touch and transmits them to the laptop. Already along the way, the idea was slightly transformed: to recognize the buttons drawn on the paper with a marker and to determine the clicks on them. First of all, this happened because recognizing the exact place of touch is problematic because of the shadow of the hand. But the buttons drawn with a marker are clearly visible and it was easy to highlight them in the image.

. .

Considering that my profile in programming is JavaScript, we decided that this would be a web page that opens on the phone. It captures a video image from the front camera, recognizes the buttons and waits for pressing. When an event occurs, information is transmitted using sockets to another page on the laptop, which does what she ~~likes~~ will be ordered.

Such a system can be divided into several logical parts:

Video capture
Image preprocessing
Contour search
Determining whether a finger is in the contour
Event transfer to client page

Consider each part a little more.

Video capture

I am sure that it will not be a secret for you that using the getUserMedia method you can get an image from a video camera and translate it in the video tag. Therefore, we create a video tag, ask the user for permission to capture video and see ourselves in the camera.

Some code

var video = (function() { var video = document.createElement("video"); video.setAttribute("width", options.width.toString()); video.setAttribute("height", options.height.toString()); video.className = (!options.showVideo) ? "hidden" : ""; video.setAttribute("loop", ""); video.setAttribute("muted", ""); container.appendChild(video); return video })(), initVideo = function() { // initialize web camera or upload video video.addEventListener('loadeddata', startLoop); window.navigator.webkitGetUserMedia({video: true}, function(stream) { try { video.src = window.URL.createObjectURL(stream); } catch (error) { video.src = stream; } setTimeout(function() { video.play(); }, 500); }, function (error) {}); }; //... initVideo();

To get a separate frame from the video, we will use the canvas and the drawImage method. This method can take the video tag as the first parameter and draw the current frame from the specified video to canvas. This is exactly what we need. We will repeat this operation at regular intervals.

 var captureFrame = function() { ctx.drawImage(video, 0, 0, options.width, options.height); return ctx.getImageData(0, 0, options.width, options.height); }; window.setInterval(function() { captureFrame(); }, 50);

Image preprocessing

Now we have a canvas element, and in it the current frame from the video stream. The next task is the recognition of drawn buttons.
In fact, the view in which the ctx.getImageData (...) method returns data is completely inconvenient for solving the problem. Therefore, before proceeding to the direct search for contours, we will bring the image to a convenient format.

The getImageData method returns a large data array, where the channels of each pixel are described sequentially. And as a convenient format, I mean a two-dimensional array of pixels. He is intuitive and working with him is much more pleasant.

We write a small function that converts the data into a convenient form for us. In this case, we can take into account that the image passing through the paper is very similar to black and white. Therefore, for each pixel we calculate the average amount of channels and write it into the resulting array. As a result, we obtain an array, where each pixel is represented by a value from 0 to 255. By coordinates, you can refer to the desired pixel and get its value: data [y] [x].

We went even further and decided that for every pixel 255 possible values is too much. For recognition of contours and clicks, two values are enough - 1 and 0. Thus, in our project, a function getContours appeared, which received an array of pixels and a variable limit at the input. If the value of a particular pixel is greater than the limit variable, then it turns into a zero (light sheet), otherwise it becomes a unit (part of a contour or a finger).

GetContours function code

 var getContours = function(matrix, limit) { var x, y; for (y = 0; y < options.height; y++) { for (x = 0; x < options.width; x++) { matrix[y][x] = (matrix[y][x] > limit) ? 0 : 1; } } return matrix; };

Now the image is presented in a convenient form and is ready for us to find buttons on it.

Contour search

Have you ever recognized the contours and objects in the image? I've never done this before. Fast googling showed that OpenCV should solve these problems without any problems. In fact, it turned out that the ported libraries have some limitations, and classifiers need to be trained. It was all like using Grails to create a landing page.
That is why we continued the search for simpler solutions and stumbled upon the beetle's algorithm (not sure that this is a common name, but in the article it was called that way).

The algorithm allows to recognize closed contours in an array of zeros and ones. An important requirement was that the borders should be at least two pixels thick. Otherwise, the logic of the algorithm fell into an infinite loop with all the ensuing consequences. But the border of the button drawn by the marker was much thicker than two pixels, so this did not become a problem for us. The rest of the algorithm is very simple:

Find the boundary point. The boundary point is the transition from white to black. You can just go through the array and find the first one.
We start bypassing the contour according to two simple rules:
- If we are on the white point, then turn right
- If we are at the black dot, then turn left
When moving along the points we do not forget to write down the coordinates of the black points on which we are located in the resulting array. Subsequently, this array will be the contour.
We complete the loop traversal at the boundary point from which we started.

So, we have a function that receives input data and finds a contour. To simplify the task limited to only rectangular shapes. Therefore, the points of the contour, we find two bounding points. Regardless of the shape of the button, we get the rectangle in which it is inscribed.

But who needs an interface from one button? If you really do, then to the full! And so the task of finding all the drawn buttons. The solution turned out to be simple: we find the button, memorize it into an array, fill the rectangle with the button in the data with zeros. Repeat the search until the array is empty. As a result, we obtain an array containing all found buttons.

By the way, in the process of testing the algorithm suffered one glass. Fortunately, it was a late evening and I was going home. In the morning I ~~took out~~ another glass from the window and went to continue the development.

Determining whether a finger is in the contour

How to be with pressing of buttons? It all turned out to be simple. When a button is found, we calculate the sum of black dots inside it. For myself, I called this value “hash buttons”. So if you press the button, the hash of the button grows by a significant amount, which clearly exceeds the random noise, interference and minimal movement of the paper and the phone relative to each other. It turns out that in each frame it is necessary to read the hash of the existing button and compare it with the initial value:

If the difference between the values is greater than the specified value, then we assume that the button is pressed and trigger the touchstart event.
If before the button was pressed, and now the amount has returned to normal, then we believe that the click has stopped and the event touchend happened.

Such is the touch screen.

Boring mode

Of course, an inquiring mind will understand that this approach is a huge scope for false positives. If you accidentally create a shadow above the button next to it, it will also be pressed.
Well, in general, yes. You can try to deal with this by installing additional checks. For example, you can create a second data array of zeros and ones, but with a more stringent black limit. Then only the “most black” color will remain in the image. This will make it possible to assume that in the data there will only be a place where the finger touches the paper, sifting out the shadow.
Well, or you can use the rules of the hackathon "do what you want" and say that it is so intended.

Event transfer to client page

I am sure that everyone knows what Socket.io is. And if you still do not know, you can read on their website http://socket.io/ . In short, this is a library that makes it possible to exchange data between the node.js server and the client bilaterally. In our case, we use them to send information about events to another web page via a server with minimal delay.

Video

Without waiting for the question in the comments, where is the video, I present you a video with a demonstration of the system.

findings

In two days we can develop an arbitrarily useless system.
and get the prize for her in the nomination "The most spectacular hack"
The system works on Nexus 5 in the Google Chrome browser. I have not tested it on other devices and other browsers.
Our development does not reach the original, but cheap. Touch table for the poor.

useful links

TouchPaper Project on GitHub
Capture video from the camera on html5rocks.com
Beetle algorithm
Socket.io library

Source: https://habr.com/ru/post/242301/

All Articles