Recently, video post-processing in runtime is becoming increasingly important - thanks to the power of modern PCs, almost every user can skip a video sequence through a complex chain of filters while watching, thereby eliminating the need for full video encoding, often produced using
slow and overcomplicated tools .
This area is pretty well covered in the desktop environment - filters like the
ffdshow raw video filter and
madVR allow you to do almost everything that you may need for a pleasant viewing. Unfortunately, the web cannot boast a similar toolkit, and you either enjoy all the shortcomings of the next video on YouTube, or open it in an external application like
MPC-BE , which is not very convenient. And it would be nice to have one magic button that activates filtering in the place where it should be - in your browser.
This post is a brief report on my research in this area, where the ultimate goal was to assess the possibility of filtering in real time at a resolution of at least 1920x1080.
Remarks
While reading the article should be considered:
- All these demonstrations are based on html5 video with the loop attribute set. This video can terribly twitch and lag during switching to the beginning of the video sequence in some browsers, due to the fault of these browsers . I did not over-complicate the code for the sake of possibly correcting this problem.
- If the repeated video annoys you, you can add loop = false to the GET parameters of the request.
- Demos were tested only in chrome, fox and IE11, in other browsers may not work.
- The source code of all the demos is shown right inside the corresponding html pages, without dependencies.
- The text has many warped English words and clumsy translations. I am not well versed in Russian terminology, corrections are welcome.
- We turn a blind eye to possible problems with CORS, sites using Flash-video, etc. Only spherical tests in a vacuum.
- In JavaScript, I'm passing through, so do not trust too much the text below. For more confidence, you can divide the given time by 2. I hope to see fixes and tips in the comments.
Principles of implementation
The only option that would allow one core for all target browsers (Chrome and Firefox in the first place) is a browser extension. The alternative in the form of
Google Chrome Native Client , suddenly, only works in Chrome, and Mozilla is not currently going to support NaCl in Firefox. In addition, I have not studied the possibility of accessing NaCl to elements on the page - it may well turn out that for our purposes it will not work.
')
The basic algorithm of the (theoretical) expansion is quite simple: look for the video element on the page, hide it, and create a canvas above, on which the filtered frames of the video stream are rendered. So far, everything is simple.
The real problem with an extension is the implementation language — interpreted JavaScript, and as we know, interpreted languages ​​are poorly suited for serious calculations. But this does not matter! JavaScript recently receives a lot of love and optimizations, and there are quite a large number of programmers who consider that JS is a language suitable for writing any applications and that everything should move on the web. Moreover, many new technologies are available, such as asm.js, SIMD.js, WebGL and WebCL, which, in theory, allow you to implement everything your heart desires, with a speed that is only slightly less than native. So we shouldn't have any major problems with writing a filter set in the browser, right?
Not really.
Clean javascript
Filtering in pure JS works as follows:
- We get both necessary elements - hidden video and canvas, located on top of it.
- We draw a frame from the video on the canvas via
context.drawImage(video, 0, 0)
, where context is the 2d context obtained from the canvas. - Get the frame buffer (array of color bytes) via
context.getImageData(0, 0, width, height)
. - We process the buffer with the required filters.
- We put the processed array back through
context.putImageData(imageData, 0, 0)
.
This algorithm works and allows real video filtering in pure JavaScript with a minimal amount of code very similar to C. This is the basic (non-optimized) implementation of an invert filter that inverts RGB bytes in each pixel of a frame:
outputContext.drawImage(video, 0, 0); var imageData = outputContext.getImageData(0, 0, width, height); var source = imageData.data; var length = source.length; for (var i = 0; i < length; i += 4) { source[i ] = 255 - source[i]; source[i+1] = 255 - source[i+1]; source[i+2] = 255 - source[i+2];
And although this method works for demos and simple images, it is very quickly “blown away” at high resolutions. Although the call to
drawImage
is
pretty fast even at 1080p , after adding
getImageData
and
putImageData
, the execution time rises to 20-30 milliseconds per iteration . The full code given above is
executed in 35-40ms already , which is the speed limit for PAL video (25 frames per second, 40ms per frame). All measurements obtained at 4770k, which is one of the most powerful home processors at the moment. This means that the implementation of any more or less complex filter on previous generations of processors is impossible
, regardless of the performance of JavaScript . Any, even very fast code, will rest on the terrible performance of the canvas itself.
But JavaScript is not very fast on its own. Although normal operations like inverting or running through LUT can be performed in a reasonable time, any more or less complex filter causes terrible lags.
A simple implementation of the filter for adding noise (Math.random () * 10 to each pixel) requires 55 milliseconds, and the
3x3 core for blur , implemented in the code below, takes 400ms, or 2.5 frames per second.
function blur(source, width, height) { function blur_core(ptr, offset, stride) { return (ptr[offset - stride - 4] + ptr[offset - stride] + ptr[offset - stride + 4] + ptr[offset - 4] + ptr[offset] + ptr[offset + 4] + ptr[offset + stride - 4] + ptr[offset + stride] + ptr[offset + stride + 4] ) / 9; } var stride = width * 4; for (var y = 1; y < (height - 1); ++y) { var offset = y * stride; for (var x = 1; x < stride - 4; x += 4) { source[offset] = blur_core(source, offset, stride); source[offset + 1] = blur_core(source, offset + 1, stride); source[offset + 2] = blur_core(source, offset + 2, stride); offset += 4; } } }
Firefox shows even more depressing results with 800 ms / pass. Interestingly, IE11 is even ahead of Chrome, and twice (but the canvas itself is slow, so it does not save). In any case, it becomes clear that pure JavaScript is the wrong tool for implementing filters.
asm.js
Modern
asm.js is a tool from Mozilla to optimize the execution of JavaScript code. The generated code will still work in chrome, but it’s not worth hoping for a serious performance increase, since
support for asm.js does not seem to have been added yet .
Unfortunately, I could not find a simple way to compile selected functions in asm.js-optimized code.
Emscripten generates about 4.5 thousand lines of code when compiling a simple two-line function, and I did not understand how you can get out of it only the necessary code in a reasonable time. Writing asm.js with your hands is
still a pleasure . In any case, asm.js will rest on the performance of the 2d context of the canvas, similar to pure JavaScript.
SIMD.js
SIMD.js is a very new technology of manual optimization of JS applications, which is currently “supported” only in
Firefox Nightly , but very soon it
can get support from all target browsers . Unfortunately, the API now
works with only two data types , float32x4 and uint32x4, which makes the whole project useless for most real 8-bit filters. Moreover, the Int32x4Array type is not yet implemented even in Nightly, so any recording and reading of data from memory will be slow and scary (when implemented
in this way ). However, I’ll give the implementation code for the normal invert filter (this time working through XOR):
function invert_frame_simd(source) { var fff = SIMD.int32x4.splat(0x00FFFFFF); var length = source.length / 4; var int32 = new Uint32Array(source.buffer); for (var i = 0; i < length; i += 4) { var src = SIMD.int32x4(int32[i], int32[i+1], int32[i+2], int32[i+3]); var dst = SIMD.int32x4.xor(src, fff); int32[i+0] = dst.x; int32[i+1] = dst.y; int32[i+2] = dst.z; int32[i+3] = dst.w; } }
At the moment, the code above is running much slower than pure JS - 1600ms / pass (Nighly users can try the next
demo ). It looks like you have to wait a sufficient amount of time before you can do anything useful with this technology. Unfortunately, it is not clear how support for 256-bit YMM registers will be implemented (int32x4 is the usual 128-bit xmm from SSE2), and whether instructions from newer technologies like SSSE3 will be available. Well, SIMD.js does not save from slow canvas. But SIMD fans can already get some
familiar bugs right now in the browser!
Webgl
A completely different way to implement filters is
WebGL . In the most basic sense, WebGL is a JS interface for native OpenGL technology, which allows you to execute various code on the GPU. It is usually used for programming graphics in games, etc., but no one bothers to process
pictures or even
video with it. WebGL also does not require calls to getImageData, which in theory makes it possible to avoid a typical 20ms lag.
But nothing is free - WebGL is not a general-purpose tool and using this API for abstract non-graphic code is a terrible pain. You will need to define useless vertices (which will always cover the entire frame), correctly position the texture (which will cover the entire frame), and then
use the video as a texture . Fortunately, WebGL is smart enough to request the necessary frames from the video machine. At least in chrome and fox. IE11 will be
WEBGL11072: INVALID_VALUE: texImage2D: This texture source is not supported
error
WEBGL11072: INVALID_VALUE: texImage2D: This texture source is not supported
.
Finally, writing filters will have to use shaders implemented in the
slightly flawed GLSL language, which (at least in the WebGL variant) does
not even
support the installation of constant arrays , so any arrays will either have to be passed using uniforms (such global variables) or use the Indian method:
float core1[9]; core1[0] = 1.0; core1[1] = 1.0; core1[2] = 0.0; core1[3] = 1.0; core1[4] = 0.0; core1[5] = -1.0; core1[6] = 0.0; core1[7] = -1.0; core1[8] = -1.0;
It also requires that the pixel shader return a single value — the color of the current pixel, which makes it impossible for a typical implementation of some filters to process several pixels per iteration (the same blocking suppression). Such filters will have to be rethought and implemented differently.
In general, technologies like CUDA and OpenCL were not invented from a good life.
As a WebGL excuse, it has performance
that is truly amazing for the web (
which you cannot measure ). At the very least, it can process the prewitt filter from
masktools (the choice of the maximum value from four 3x3 cores) in real time
at 1080p and higher. If you hate yourself and are not afraid to get a little unsupported code, WebGL allows you to do quite interesting things with the video. It may be more reasonable to use the library
seriously.js , which hides part of the template WebGL code, but may not be sufficiently advanced to handle changes in the video resolution or implement temporary filters.
If you love yourself, then most likely you will want to use something like WebCL.
Webcl
But it will not work. Wikipedia
says that WebCL 1.0 was finalized on March 19th of this year, which makes technology the youngest of the entire list, even younger than SIMD.js. And, unlike SIMD.js, it
will not be supported in Firefox in the near future . I read somewhere about a similar solution for Chrome, but lost the link. So WebCL is currently dead technology with no clear future.
Conclusion
Processing real-time video in a browser is possible, but the only working option now is to use WebGL, where video filter programming is an exercise worthy of real masochists. All other methods rest on the terrible performance of 2d context of canvas, and they themselves do not shine with execution speed. Such sad things.