Features create interactive 3D HTML5 applications using the sensor Kinect.
Task
Show 3D photos and videos, captured in different parts of our large Motherland, on a 3D display, and so that playback starts when the user enters a certain geographic area on a map placed in front of the screens. Ideally, the stand should consist of 6 3D displays and 3 Kinect sensors in order to maximize the spreading of geofences in front of the screens and allow multiple users to manage their displays.
Decision
During the discussion, we investigated the issue of displaying full-fledged stereoscopic (non-anaglyph) 3D content. Among the possible ways of implementation was the technology nVidia 3D Vision, whose driver and add-ons for Firefox allowed to display 3D.
Among the technologies of 3D displays, we stopped at less expensive than active 3D (shutter glasses + 100 Hz frame-by-frame sweep), namely, passive polarization 3D technology with line-wise technology of separation of left and right channels. The polarization was chosen to be circular as the most adequate when moving and turning the head relative to the screen.
It should be noted that the nVidia 3D Vision driver turned out to be omnivorous and knew how to work with any type of display (at least with most of the ones at our disposal). However, the vague API, instability and commercial policy of nVidia made it necessary to quickly doubt the correctness of the choice. So the idea was to build your own driver. And the arguments were:
- the technology with line-by-line output of the left and right channels implies the presence of two images, joined by side-by-side (top-bottom or left-right),
- If you recombine the pixels on the FullHD side-by-side image in the “left / right” lines alternately, and also achieve a strict display of the “pixel to pixel” raster on the LCD matrix, you can achieve a reasonable 3D display without special drivers.
side-by-side

interlace

')
So technological intuition led us to shaders. To begin with, we tested GLSL for WebGL with examples from the threejs framework. The shader was written in several lines: the parity of the line is checked, if it is odd, we output pixels from the left image, even ones from the right. All of this went well and worked quickly on HD - about 30fps (Windows 7, Chrome, nVidia GPU). FullHD video reduced this figure to about 12 fps. The profiler showed a bottleneck - before each draw call it was necessary to transfer a huge buffer with a texture that changed in each frame. And it was no way to win "on the fly." Then Flash and the well-known Pixel Bender technology (2D) came to the rescue. Essentially this is the same GLSL syntax, just a slightly different specification. For several hours we implemented a flash component of a 3D player, which had a simple JS api, as close as possible to HTML5 <video>. So, we have two implementations of the 3D player in the JS wrapper. Tests on FullHD video with Flash 3D player showed more decent results ~ 25fps. So we won the dampness.
Webgl
#ifdef GL_ES precision highp float; #endif uniform sampler2D Sample0; varying vec2 vUv; void main () { vec3 colour; float row = ceil(vUv.y * 1080.0); vec2 pixel; float mod = mod(row, 2.0); if (mod == 0.0) { pixel = vec2(vUv.x / 2.0, vUv.y); } else { pixel = vec2(0.5 + vUv.x / 2.0, vUv.y); } colour = texture2D(Sample0, pixel).xyz; gl_FragColor.xyz = colour; gl_FragColor.w = 1.0; }
Pixel bender
<languageVersion: 1.0;> kernel interlace < namespace : "Interlace 3D"; vendor : "NMT"; version : 2; description : "Create interlaced video from side-by-side"; > { input image4 oImage; output float4 outputColor; parameter bool swapEyes < description: "swap eyes"; defaultValue: false; >; void evaluatePixel() { float2 relativePos = outCoord(); float modulus = mod(relativePos.y, 2.0); if (swapEyes && modulus <= 1.0 || !swapEyes && modulus >= 1.0) { relativePos.x = 960.0 + relativePos.x / 2.0; } else { relativePos.x = relativePos.x / 2.0; } outputColor = sampleLinear( oImage, relativePos ); } }
And then it's easier! We implemented a Kinect bundle tcp server with JS using WebSocket. Adapted tracking of users in the field of visibility of Kinect under several zones, screwed up the effects, superimposed the design and
it turned out !
Results
Driver vs Shader
- The advantage of shader technology is the ability to display 3D content not only rendered in photo or video, but also runtime 3D scenes in stereoscopic mode, that is, True 3D in HTML5 / WebGL or Flash / Pixel Bender / Stage3D applications in the browser; however, for this approach, a display with line-wise separation technology of the left and right channels is important.
Kinect
- it is important for the user to see his / her own image or image of hands during the process of
- natural gestures with imitation of physical laws (inertia, collisions) are required,
- it is very useful to develop in the open space (almost in combat) conditions, "everyones go here",
- Using tcp is a restriction only for JS applications, if you can afford UDP (flash) - use, avoid problems with memory leaks and low fps, although at the time of creation UDP was available in Chrome Canary.
3D display
- to obtain a full stereo effect, it is important to configure the display of pixel-to-pixel in a video card-display bundle, this is achieved by resetting all possible scaling in the video card drivers and display settings, as well as the exact resolution supported by the monitor,
- It is important to follow the rules of depth order for objects on the screen: the layer of controls and text information, the layer of 3D objects, the background, the overlapping depth of these entities severely affects perception and can lead to fatigue,
- It is important to pick up the glasses so that they close the opposit channels well (right from the left and vice versa) - even glasses from the manufacturer of the 3D panel itself let an opposit channel in the form of a purple ghost image, we found a fairly high quality pair of 3D Screen LG 55 ”+ Philips Glasses
- take care of the highlights on the screen - they should not be, very distracting from the perception of 3D - looks like an extra layer of the image.