str = "1234567890123456789012" + "x"
str = "12345678901234567890123" + "x"
require 'benchmark' ITERATIONS = 1000000 def run(str, bench) bench.report("#{str.length + 1} chars") do ITERATIONS.times do new_string = str + 'x' end end end
user system total real 21 chars 0.250000 0.000000 0.250000 (0.247459) 22 chars 0.250000 0.000000 0.250000 (0.246954) 23 chars 0.250000 0.000000 0.250000 (0.248440) 24 chars 0.480000 0.000000 0.480000 (0.478391) 25 chars 0.480000 0.000000 0.480000 (0.479662) 26 chars 0.480000 0.000000 0.480000 (0.481211) 27 chars 0.490000 0.000000 0.490000 (0.490404)
malloc
, the standard C function that does dynamic memory allocation. In fact, this is a rather resource-intensive operation, because you need to find free memory blocks of the right size in the heap, as well as track the release of this block after the operation is completed.
RString
structure is created, but the malloc
function applies only to the first type of strings (heaps), but it does not apply to identical strings and embedded strings, thereby saving resources and improving performance. How does this optimization occur? The Ruby interpreter first checks the string for uniqueness: if it is a copy of an existing string, then there is no need to allocate a new memory for it. This structure RString
is created the fastest.
struct RString { long len; char *ptr; VALUE shared; };
malloc
is not called, and the value is embedded directly into the RString
structure via char ary[]
.
struct RString { char ary[RSTRING_EMBED_LEN_MAX + 1]; }
RString
looks like this.
struct RString { struct RBasic basic; union { struct { long len; char *ptr; union { long capa; VALUE shared; } aux; } heap; char ary[RSTRING_EMBED_LEN_MAX + 1]; } as; };
RSTRING_EMBED_LEN_MAX
array RSTRING_EMBED_LEN_MAX
set as the sum of the len / ptr / capa values, that is, just 24 bytes. Here is a line from ruby.h that defines the value of RSTRING_EMBED_LEN_MAX
.
#define RSTRING_EMBED_LEN_MAX ((int)((sizeof(VALUE)*3)/sizeof(char)-1))
RString
structure, only 23 characters from the string value can fit. If the string exceeds this value, only then the data is placed in the "heap", for which malloc
is called and the corresponding resource-intensive procedures occur. That is why the "long" lines are processed more slowly.Source: https://habr.com/ru/post/135832/