Random letter generator and its variants

Turning to the topic of writing random letter generators suggested that in JS there is an atypical native function of converting a string to an n-ary number, where n = 2..36. 36 in the standard language was not invented by chance - it is the sum of the number of numbers and small English letters from which it is proposed to write such numbers. This means that with a couple of native functions it is already possible to build a useful generator of small strings from an alphabet.

Math.random().toString(36) //   0.816cwugw2ky, 0.opgqwav8w1m, 0.f0w4ejtq8wk, ...

This means that for some tasks it is possible not to write relatively honest generators based on dull strings of the form “abcdefghijklmno ...”.

The results can be used in any programming language in which we will get used functions in any way, but we will talk about javascript, since the length and simplicity of functions as text in the browser matter in various practical tasks.

Having learned about key words, it is easy to google achievements in building random letter digital generators, which are offered, in particular, at StackOverflow. You can see a funny picture that they first write a conscientious wordy algorithm with a choice of characters from a string, it is furiously plus, like here , then someone remembers that there is such a very short way, which sometimes fits very well into the solution of the problem. However, it does not always bring the matter to the end, and the decision needs to be finished, independently assessing the speed and clarity of the decision.
')
Consider solutions for different typical tasks. Let's try to estimate the speed of some decisions - sometimes you need to generate random strings in large volumes. The result is a story about the methods and their comparison. Let's see what is usually asked on the Internet when it comes to generating random characters.

We will not touch on the quality of random sequences - let's take the Math.random () generator as a basis. Its improvements are highly dependent on the needs and there is a topic for a separate study. Let's not forget about the balance of probabilities of the appearance of various symbols, so that our algorithm does not worsen the probability of the distribution of numbers or symbols. Consider and classic algorithms with a sequence of characters as a model for the generator.

The task in general form is as follows:
" You need to get a sequence of random characters (numbers, letters, small and large letters, all together, the letters of the national alphabet) so that the probability of the appearance of any character is equal for all characters. "

If the question concerns a small number of digits, then the solution is already there, you just need to cut off the desired piece of them from the result of the Math.random () function.

 function randN(n){ // [ 1 ] random numbers return (Math.random()+'').slice(2, 2 + Math.max(1, Math.min(n, 15)) ); }

We will look at first with what it is necessary to deal: how many digits after a point function Math.random () gives.

 Chrome30: 0.5439428824465722 Firefox24:0.8270932732750623 Opera 12: 0.1945655094774541 IE8: 0.48207786689050286

In binary representation (javascript: Math.random (). ToString (2)):

 Chrome30: 0.10010010010000001000110001001001 Firefox24:0.1000101010011110010101000000110000110010010101111001 Opera 12: 0.11000111010010011011101000110100000000100110000011111 IE8: 0.10000011001010000101111011000010000101001001110001

In 36-bit (javascript: Math.random (). ToString (36)):

 Chrome30: 0.acpi1knlm53tyb9 Firefox24:0.r9pe3mirhw Opera 12: 0.kmzlo986rok IE8: 0.33t3if0bsh6j

As you can see, one must be careful in anticipation of the number of digits, and in accuracy, depending on the number system. For the minimum limit, you must take the minimum of the number of digits shown and completely trust no one. At least 16 are shown among decimal digits, so the maximum number was 15, which is already too optimistic, but the value of our first function is small.

The results of profiling the randN () function without checking the limits (300 thousand cycles with 15 characters each, values in milliseconds, averaging over 20 measurements - 300,000 * 15 * 20 = 90 million characters were calculated):

 Ch30: 240.2 ± 2.47% Fx24: 770.8 ± 1.68% Op12: 274.9 ± 3.89% IE8 : 1020.4 ± 0.71%

(for those who are interested in how profiling took place)

 <div class="resTest"> <table class=resTable> <tr><td class=randN></td></tr> </table> <script src="jquery-1.9.1.js"></script> <script type="text/javascript"> onload = function(){ var tSred = function(a){ var s =0; for(var i =0, iL = a.length; i < iL; i++) s += a[i]; return s / iL; }; var tSigm = function(a){ var s = 0, f2 = 0; for(var i =0, iL = a.length; i < iL; i++){ f2 += a[i] * a[i]; s += a[i]; } s = s / iL; return Math.sqrt(f2 / iL - s * s); }; var randN = function(n){ return (Math.random()).toString().slice(2) } ,testName ='randN' ,nIzmer =20 ,timeS =[]; for(var j =0; j < nIzmer; j++){ var i =3e5 ,time0 = +new Date(); while(i-- >0) randN(15); timeS.push(new Date() - time0); } var sred = tSred(timeS).toFixed(1) ,sigm = (tSigm(timeS) / sred *100 ).toFixed(2) ,nu = navigator.userAgent ,br = /MSIE/.test(nu) ?'IE8 : ': /Opera/.test(nu) ?'Op12: ': /Firefox/.test(nu) ?'Fx24: ':'Ch30: ' $('.resTest .resTable .'+ testName).html(br + sred +' ± '+ sigm +'% | '+ timeS.join(', ')); };</script></div>

The same, but with checking the limits of the argument:

 Ch30: 238.4 ± 4.16% Fx24: 782.5 ± 1.84% Op12: 303.1 ± 3.18% IE8 : 1230.5 ± 0.56%

The first instructive conclusion is: although checks seem to be fast operations, they are not made really fast everywhere.

Overhead costs in this profiling take very little and are within the limits of error: if you put an empty cycle, its pass rate is calculated as 100-300 times faster (from 0.5 to 10 ms for the same 300 thousand cycles). Messages about whether it is time to stop the scripts for tests in browsers are specifically disabled. Thus, we measure exactly what we want. It makes sense only a comparison of indicators made on one computer. If we want to make a comparison with IE10, in all other browsers we will have to repeat the tests - the operating system will be different and the speeds of both browsers and computer hardware will be different.

Already interesting? But it was just the preparation of tools for this study. They will work for all other measurements, the main loop and the total given number will be different. The resulting function [1] randN () gives us very little: only a line of up to 15 random numbers. And ahead there are letters, numbers with letters, Russians and other national ones. The most interesting thing is to compare how much the application of native functions of the form .toString (36) differs in speed compared to the more traditional cross-language algorithm.

Latin alphabet

There is little nationalism in this, so a politically correct wording crept into the headline. Already the French and Spaniards will be unhappy that not all their letters of the alphabet are included in the proposed set. But it will be useful for generating random names, if only with small Latin letters alone.

Suppose, for a start, that we will not need a lot of alphabetical letters - a maximum of 10 is enough that Firefox can squeeze out of itself.

 function randWD(n){ // [ 2 ] random words and digits return Math.random().toString(36).slice(2, 2 + Math.max(1, Math.min(n, 10)) ); } //result is such as "46c17fkfpl"

How have the speeds changed (with limits checked: we are serious people)?

 Ch30: 195.4 ± 3.17% Fx24: 1049.0 ± 1.26% Op12: 421.6 ± 2.46% IE8 : 1150.8 ± 0.66%

Some people passed the test 1.5 times longer, Chrome showed miracles at all by completing the 36-bit conversion faster than decimal (and he calculated the binary for 350 ms), IE is a little faster than himself, beyond the limits of errors.

The main result is this: without losing anything, we calculated the alphabet numbers about as quickly as random numbers.

Now let's compare the speed with the dull classical algorithm, where you need to dial the whole alphabet and numbers into the bargain. I did not want to write it, but science requires sacrifice.

Alphabets from the classics

 randWDclassic = function(n){ // [ 3 ] random words and digits by the wocabulary var s ='', abd ='abcdefghijklmnopqrstuvwxyz0123456789', aL = abd.length; while(s.length < n) s += abd[Math.random() * aL|0]; return s; } //such as "46c17fkfpl"

 Ch30: 226.4 ± 0.95% Fx24: 1006.4 ± 0.93% Op12: 917.3 ± 1.14% IE8 : 1268.0 ± 0.45%

He calculated 15 characters in the test instead of 10 in the previous algorithm, showed on average even good results, and he has no problems with scaling. Of the minuses - you need to write a lot of letters. But the first algorithm did not say its last word either. A big plus of the algorithm with a string is the ability to assemble a string of any characters and even provide for the frequency of characters in it, although a multiple of the minimum frequency of occurrence of the only character mentioned.

It means that in this algorithm all the tasks are solved and, moreover, frequency issues. Can native toString (36) oppose something and take at least part of a niche? After all, she has small Latin letters on the issue or just numbers, and on stackoverflow they want everything that you can imagine: small with large letters without numbers, not far off - national sets.

Revenge toString (36)

We need to solve 2 problems with this approach: ensure scalability and at least learn how to produce only numbers. Then, if the algorithm is fast, it will occupy its niche. If only because he is brief.

We use the fact that we can type a string of several characters at a time.

 function randWDn(n){ // [ 4 ] random words and digits unlimited var s =''; while(s.length < n) s += Math.random().toString(36).slice(2, 12); return s.substr(0, n); }

 Ch30: 440.6 ± 0.49% Fx24: 1760.2 ± 5.05% Op12: 858.8 ± 1.36% IE8 : 2495.3 ± 0.35%

Calculated 15 characters (in fact, at least 20), the function is scalable, can take any positive argument. In general, the comparison looks rather disastrous: the algorithm with toString (36) plays the role of a catch-up, comparing in speed in areas of 10, 20, 30, characters and playing in the intervals. It only supports 2 (for now) character sets: numbers and letter numbers. Can it be extended to letters without losing its brevity? Yes.

 function randWn(n){ // [ 5 ] random words var s =''; while(s.length < n) s += Math.random().toString(36).replace(/\d|_/g,'').slice(2, 12); return s.substr(0, n); } //such as "amyozispiizmmrb"

 Ch30: 785.6 ± 0.80% Fx24: 3121.6 ± 4.58% Op12: 1985.5 ± 0.62% IE8 : 5967.2 ± 0.16%

What is there to say? Tested to get 15 characters, actually 20. 90-140 million characters for the entire test. Received purely alphabetic strings. In regexp it is now possible to write other conditions for cutting out unnecessary elements. For example, turn off some letters. Regekspy sags at all. They behave super-failingly in IE8 - they received not 90 million random characters per second, as in Chrome, but some 16, which is also a lot, looking to compare with. In comparison with character sets lose in all respects.

Findings. Using the exotic native function .toString (36) is a beautiful thing. It is beneficial to use it not as part of a function, but for directly taking a random string of up to 10 characters in the code once. Then she looks short, although not everyone understands. In the rest, in terms of speed and flexibility, they lose to character sets [3].

Random Cyrillic on regexp

In the Cyrillic version, there is nowhere to use toString (36). Let's go the other way. Get any character from a certain range of codes and remove unnecessary ones. Something like the sieve of Eratosthenes. It is clear that each deletion of a calculated random symbol is recorded in a passive, therefore, the greater the percentage of deletions, the slower the algorithm and the stronger it will lose to the classics [3]. Optimizing is easy if the character set occupies a limited range of Unicode codes. But then checks and magic numbers appear in the code.

Here, for starters, an example of such a latin assembly. We get the addition of large and small letters. Because of the "laziness", because they chose the range of 0 ... 127, there are many unnecessary calculations of random variables. This does not interfere with the algorithm - it will dial a string if there is at least 1 character in the range, but time suffers.

 function randAa(n){ // [ 5 ] random big/small letters var s =''; while(s.length < n) s += String.fromCharCode(Math.random() *127).replace(/\W|\d|_/g,''); return s; //such as "AujlgLpHLVfDVpNEP" }

(If we want to see how many attempts disappear, simply write ".length" at the end of a long line. The zeroes in the response line are the missing attempts, 1 is used. Approximately such a density: "0101000010000000010010100010100101100". This is not to be surprised how slower there will be an algorithm.)

 Ch30: 2897.7 ± 0.23% Fx24: 7720.0 ± 0.35% Op12: 74651.3 ± 0.35% IE8 : 49903.7 ± 0.21% - 50   100

It is easy to check what is slowing down here a short but difficult regexp (several checks for letters and numbers). The easiest way to optimize is to exclude the range 0 ... 47 by writing Math.random () * 80 +48 inside the brackets.
We do the same for Cyrillic. We take the range where all it is, and delete the non-Cyrillic alphabet. It must be remembered that such cycles are recorded beautifully, but they are performed extremely slowly, 1106/66 times slower than they could be if the range was effectively selected, so using them for calculating several thousand characters is normal, but for millions it will not work. At the same time, the algorithm with the character set in the string is suitable for millions.

 randAYa = function(n){ // [ 6 ] random big/small cyrillic letters var s =''; while(s.length < n) s += String.fromCharCode(Math.random() *1106).replace(/[^--]|_/g,''); return s; //such as "" }

For tests, there were taken 100 times fewer passes with a set of 15 characters, but the results were increased 100 times for visual comparison with others.

 Ch30: 21800.3 ± 1.77% Fx24: 64100.6 ± 1.70% Op12: 57000.0 ± 1.92% IE8 : 341200.5 ± 4.20%

The time can be reduced by 16 times, based on the fact that the code 1105 is the maximum Unicode of the Cyrillic alphabet, and 1040 is the minimum by writing Math.random () * (1106-1040) * 1040 in the right place. But still, seconds per 100 million characters are a lot for mass calculations as compared with the function [3].

Source: https://habr.com/ru/post/197046/

All Articles

Random letter generator and its variants

Latin alphabet

Alphabets from the classics

Revenge toString (36)

Random Cyrillic on regexp

More articles: