
Updating the design of my hamster recently, I wondered if I should make some unusual page with the 404th error? As a child, I was impressed with the
life of Conway (as many of my readers might), I decided to implement it on JS.
It would seem that the complex in life: if a busy cell has 2 or 3 neighbors - does it remain, if empty has exactly 3 - is it born? In this article I will talk about my optimization of the algorithm and rendering on canvas, some not obvious moments of integer / binary arithmetic in JavaScript.
Looking ahead, the final result can be
seen here , the source can be seen in the same place (and even under the license CC BY).
')
We go in a simple way
for(y=1;y<maxy-1;y++)
It certainly works, but the problem is one - on i7-3820@4.3Ghz and Firefox 12 - this code has time to calculate and draw only 8.5 FPS (frames per second). A quick test shows what exactly slows down the rendering.
Rendering optimization
We will draw a pixel only if it has changed - 67 FPS.
Because switching the current color in context to each cell is too hard an operation, we will draw in 2 passes, first all the cells that were born, then all the dead: 80 frames per second. Since I’m not displaying the entire calculated field (so that there are no “glitches” from the collision with the end of the world), I’m not trying to draw invisible cells on the screen - 125 FPS.
Off-screen canvas rendering in practice did not give any acceleration, on the contrary, there was a slight drop due to copying to the visible canvas.
It only remains to start frame rendering not from setTimeout but requestAnimationFrame - in order not to draw animation when the user is not looking at it (for example, if the page is in some background browser tab)
Apparently you have to go to the optimization algorithm ...
Existing optimization methods
The
superiority in speed on near-infinite fields is kept by
hashlife - roughly speaking the field is divided into quad-tree, and the same blocks are not calculated, but the result is immediately taken from the cache. The problem here is that such an algorithm “gets bogged down” slowly, eats a lot of memory, and for our field (256 * 192) it’s like to hammer nails with an electron microscope.
Another group of algorithms works on excluding field blocks from calculations where it is empty (or no changes). But in my case, the field is almost always densely filled with activity, so the speed increase will be, but not fantastic.
Another approach is to store a queue of changing cells, and update the amount immediately in an array. Those. instead of doing X * Y * 8 operations, we only do (number of changed cells in the previous step) * 8. Of course, there are substantial overhead costs for working with the queue, but even for a dense field, acceleration is 3-5 times easier to obtain (for large fields with low fields, this is a fairly good algorithm).
But I will go my own way
The main idea is that since JS arrays are made up of objects, access to them is relatively slow. Well, the calculation of the new state of the cell through the condition is always bad for the processor due to unpredicted branches. Therefore, I will minimize the number of calls to the array and rewrite the code without branching.
I will store the field in bit form (32 values ​​per array element). 32-bit numbers in JS are stored and interpreted exactly as Signed (!) Integer, but if we even get out of 32-bits for one bit, we can get a real number. Another interesting feature is that in JS there are 2 right-shift commands, >> and >>>. >>> differs in that it treats the number as unsigned (and “moves” zeros to the empty space, not the sign bit). It is this shift that we will need to use when working with numbers, where we use all 32 bits.
Now when we go along the line from left to right - we can immediately get the value of 3 consecutive cells: value & 7. To calculate the sum of these cells, let's make a lookup table for 8 possible combinations, and in order not to turn to a slow array even once, we stuff it into one number:
Now we can calculate the sum of 3 cells at once without additional appeals to the array:
sum = (0164624 >> ((value& 7)<<1)) & 3;
Similarly, we can calculate the sum on the top and bottom lines. To exclude the cell itself around which we count the sum - in the middle line we make & 5 instead of & 7. Thus, we obtained 3 sums from 3 rows without accessing the array.
Next, we need to get a new cell state - again we will make the lookup table, 3 bits will go to the sum (maximum 8), 1 bit to the old state:
state_lookup = 0340;
Now we can get a new state of a cell without branches:
(0340 >>> ((sum<<1) | (v2&1)))&1
It remains only to think how all 32 cells can be calculated - for this we need to have access in addition to one cell to the left and to the right. We will have to break the work into 2 parts, 16 cells each, and load additionally at least on the 1st cell to the left and to the right (here we will need an unsigned right shift so as not to get extra units in the higher digits when the negative numbers are shifted). After calculating both 16-cell halves, the finished 32-bit number is written to the array, and the changed cells need to be drawn.
Born cells we can get like:
need_draw = (new_state ^ data[readpage][y ][x]) & new_state;
And the dead:
need_draw = (new_state ^ data[readpage][y ][x]) & ~new_state;
If need_draw == 0, then you do not need to draw anything; otherwise, we run over 32 bits and draw the necessary changes.
The final code is visible in View Source, I will not clutter it here.
I can also note that happy native application writers may have a lookup table of 512 single-bit values ​​— getting the new state from the 9-bit old environment directly. The lookup table would take 64 bytes, and it would fit into the L1 cache line ... Ah, dreams, dreams ...
results
The final speed suits me completely, even on older computers there is a certain performance margin (the animation code tries to draw no more than 30 frames per second).
Browser | FPS with drawing | FPS without drawing |
Firefox 12 | 473 | 1545 |
IE 9 | 209 | 451 |
Chrome 19 | 274 | 1210 |
Remarkably, when you disable hardware acceleration in Firefox, the speed with rendering drops by ~ 1.5 times. But in general, I am glad that FireFox pulled up to the performance bar set by Chrome.
You can test yourself here:
with drawing ,
without drawing .
The result in the finished form
can be seen here . By the way, if you see any shoals on my site on the site, feel free to write about it at
3@14.by , I'd rather know about them from you ...
There are ideas for further optimization and about life in general - in the comments!
