Ruby 2.1 in detail (Part 1)

Ruby 2.1, the last significant version of Ruby (at the time of writing), was released on Christmas Day 2013, only 10 months after the release of 2.0.0. She came out with a number of changes and improvements, and this post describes these innovations in detail.

New version control policy

With version 2.1, Ruby moves to a new version change scheme based on Semantic Versioning .

Versions are released according to the scheme MAJOR.MINOR.TEENY, i.e. in version 2.1.0: 2 is the major version, 1 is the minor version, the minor version is 0. The minor version number is taken from the patchlelo for the minor bug and security vulnerabilities. The minor version number will be used for new features, mostly backward compatible, and the major version for incompatible changes, which cannot be released under the minor version.

This means that instead of, for example 1.9.3 and 1.9.3-p545, we will receive releases of the form 2.1 and 2.1.1.
')
It is planned to release minor versions every 12 months, i.e. we can expect Ruby 2.2 for Christmas 2014.

Required Named Arguments

Named arguments introduced in Ruby 2.0.0 have added a slight improvement. Now you can omit the default value for the named arguments when defining the method, and if they are not specified when the method is called, an error will be raised.

# length is required def pad(num, length:, char: "0") num.to_s.rjust(length, char) end pad(42, length: 6) #=> "000042" pad(42) #=> #<ArgumentError: missing keyword: length>

As you can see from the example above, in some situations, named arguments are needed to eliminate ambiguity, but for them it is impossible to find a suitable default value. Now you do not need to choose.

String # freeze method optimization

Since Ruby strings are mutable, each string literal is a new object each time it is executed:

 def env "development" end # returns new String object on each call env.object_id #=> 70329318373020 env.object_id #=> 70329318372900

It can be very wasteful - first create a certain number of objects, and then delete them by the garbage collector. To avoid this, you can directly call the #freeze method, which means that the search for the string will take place in the table of "frozen" rows, and the same object will be used every time:

 def env "development".freeze end # returns the same String object on each call env.object_id #=> 70365553080120 env.object_id #=> 70365553080120

String literals that are hash keys will also be processed without the need to call #freeze.

 a = {"name" => "Arthur"} b = {"name" => "Ford"} # same String object used as key in both hashes a.keys.first.object_id #=> 70253124073040 b.keys.first.object_id #=> 70253124073040

In the process of developing 2.1, this feature was originally a change in the syntax, where “string” f meant a “frozen” string. However, it was decided to move from such a change to calling the #freeze method, which does not break forward and backward compatibility, plus many people dislike the unnecessary changes in the syntax.

def returns the method name as a Symbol object.

The result of the method declaration is now not nil, but the Symbol object corresponding to the method name. A canonical example of using this is declaring only one method private.

 class Client def initialize(host, port) # ... end private def do_request(method, path, body, **headers) # ... end def get(path, **headers) do_request(:get, path, nil, **headers) end end

It can also be used to add decorators to methods, the following is an example of using Module # prepend to wrap a method in before / after callbacks:

 module Around def around(method) prepend(Module.new do define_method(method) do |*args, &block| send(:"before_#{method}") if respond_to?(:"before_#{method}", true) result = super(*args, &block) send(:"after_#{method}") if respond_to?(:"after_#{method}", true) result end end) method end end class Example extend Around around def call puts "call" end def before_call puts "before" end def after_call puts "after" end end Example.new.call

will lead

 before call after

The define_method and define_singleton_method methods have also been changed and now return Symbol objects instead of their proc arguments.

Literals for rational and complex numbers

Ruby has literals for the classes Integer (1) and Float (1.0), now they have added literals for the classes Rational (1r) and Complex (1i).

They work well with the casting mechanism for mathematical operations in Ruby, so 1/3 can be written in Ruby as 1 / 3r. 3i represents a complex number 0 + 3i, which means that complex numbers can be written in standard mathematical notation, 2 + 3i in Ruby represents a complex number 2 + 3i!

Array / Enumerable #to_h

The set of classes that received the #to_h method in Ruby 2.0.0 has now been expanded with the Array class and any other class that includes Enumerable.

 [[:id, 42], [:name, "Arthur"]].to_h #=> {:id=>42, :name=>"Arthur"} require "set" Set[[:id, 42], [:name, "Arthur"]].to_h #=> {:id=>42, :name=>"Arthur"}

This can be useful when using Hash methods that return an Array:

 headers = {"Content-Length" => 42, "Content-Type" => "text/html"} headers.map {|k, v| [k.downcase, v]}.to_h #=> {"content-length" => 42, "content-type" => "text/html"}

Split method caching

Prior to version 2.1, Ruby used the global method cache, which was disabled for all classes when adding a new method, connecting a module, including a module in an object, etc. anywhere in your code. This made some classes (such as OpenStruct) and some tricks (such as exception tagging ) useless due to performance considerations.

Now this is not a problem, Ruby 2.1 uses method caching based on a class hierarchy, invalidating the cache only for a given class and its subclasses.

A method has been added to the RubyVM class that returns some debug information for the method cache:

 class Foo end RubyVM.stat #=> {:global_method_state=>133, :global_constant_state=>820, :class_serial=>5689} # setting constant increments :global_constant_state Foo::Bar = "bar" RubyVM.stat(:global_constant_state) #=> 821 # defining instance method increments :class_serial class Foo def foo end end RubyVM.stat(:class_serial) #=> 5690 # defining global method increments :global_method_state def foo end RubyVM.stat(:global_method_state) #=> 134

Exceptions

Exception objects now have a #cause method that returns the exception that raised the given. It is set automatically when you catch one exception and excite another.

 require "socket" module MyProject Error = Class.new(StandardError) NotFoundError = Class.new(Error) ConnectionError = Class.new(Error) def self.get(path) response = do_get(path) raise NotFoundError, "#{path} not found" if response.code == "404" response.body rescue Errno::ECONNREFUSED, SocketError => e raise ConnectionError end end begin MyProject.get("/example") rescue MyProject::Error => e e #=> #<MyProject::ConnectionError: MyProject::ConnectionError> e.cause #=> #<Errno::ECONNREFUSED: Connection refused - connect(2) for "example.com" port 80> end

At the moment, the first exception is not displayed anywhere and rescue does not pay attention to the cause of the occurrence, but the presence of the #cause method can significantly help with debugging.

Exceptions also have a #backtrace_locations method, which for some reason is not available in 2.0.0 . It returns Thread :: Backtrace :: Location objects instead of strings, which gives easier access to backtrace details.

Generation Garbage Collector

Ruby 2.1 introduces generation-based garbage collection, which divides all objects into younger and older generations. At normal start-up, the GC will view only the objects of the younger generation, while the objects of the older generation will be viewed much less frequently. Removal of objects (sweeping) is done in the same way as in 1.9.3 (lazy sweep). If an object from the younger generation “survives” when the GC starts, it goes into the older generation.

If you have older objects that refer to objects of the younger generation, while processing only the younger generation, GC can mistakenly find that there are no references to this object and delete it. To prevent this, record barriers were introduced that add objects of the older generation to a special memorized set (remember set) when they begin to refer to objects of the younger generation (for example, old_array.push (young_string)). This set is then taken into account when marking (marking) the younger generation.

Most generational garbage collectors need such barriers for all objects, but for many third-party C extensions for Ruby this is not possible. Therefore, as a temporary solution, it was introduced that objects for which barriers are not created (so-called “shadow” objects) never fall into the older generation. This solution is not ideal in terms of full use of all the features of the generational garbage collector, but it provides maximum backward compatibility.

Now the labeling phase is much faster, but the presence of recording barriers means additional overhead, so the differences in performance directly depend on what your code does.

Class GC

The GC.start method can receive two new parameters, full_mark and immediate_sweep. Both are by default true.
If full_mark is set to true, the marking phase passes for both generations, if false, then only for the younger. If immediate_sweep is set to true, complete immediate removal of objects (stop the world) will be performed; if false, a lazy sweep will be performed, only when necessary and removing the required minimum of objects.

 GC.start # trigger a full GC run GC.start(full_mark: false) # only collect young generation GC.start(immediate_sweep: false) # mark only GC.start(full_mark: false, immediate_sweep: false) # minor GC

The debug option GC.stress can now be set as an integer flag, specifying which part of the collector should be enhanced.

 GC.stress = true # full GC at every opportunity GC.stress = 1 # minor marking at every opportunity GC.stress = 2 # lazy sweep at every opportunity

GC.stat now displays more details, and can also receive a parameter as an input and return a value only for this key, instead of displaying the entire hash of values.

 GC.stat #=> {:count=>6, ... } GC.stat(:major_gc_count) #=> 2 GC.stat(:minor_gc_count) #=> 4

Also, the latest_gc_info method appeared, which returns information about the last launch of the garbage collector.

 GC.latest_gc_info #=> {:major_by=>:oldgen, :gc_by=>:newobj, :have_finalizer=>false, :immediate_sweep=>false}

GC - setting environment variables

Many new environment variables have been introduced that are taken into account when Ruby's garbage collector is running.

RUBY_GC_HEAP_INIT_SLOTS

This option was previously available as RUBY_HEAP_MIN_SLOTS. It sets the initial location of the slots and is set to 10000 by default.

RUBY_GC_HEAP_FREE_SLOTS

This option was also previously available as RUBY_FREE_MIN. It sets the minimum number of slots that should be available after running the GC. If the GC does not free enough slots, new ones will be allocated. The default is 4096.

RUBY_GC_HEAP_GROWTH_FACTOR

Sets the ratio by which the number of allocated slots will grow. (next slots number) = (current slots number) * (this factor). The default is 1.8.

RUBY_GC_HEAP_GROWTH_MAX_SLOTS

The maximum number of slots allocated at one time. The default is 0, which means that the maximum is not set.

RUBY_GC_MALLOC_LIMIT

This option is not new, but deserves a mention. It sets the amount of memory that can be allocated without recourse to garbage collection. The default is 16 * 1024 * 1024 (16MB).

RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR

The magnification factor is malloc_limit, the default is 1.4.

RUBY_GC_MALLOC_LIMIT_MAX

The maximum that malloc_limit can reach. The default is 32 * 1024 * 1024 (32MB).

RUBY_GC_OLDMALLOC_LIMIT

The amount that the older generation can reach before calling full garbage collection. The default is 16 * 1024 * 1024 (16MB).

RUBY_GC_OLDMALLOC_LIMIT_GROWTH_FACTOR

Increase factor old_malloc_limit. The default is 1.2.

RUBY_GC_OLDMALLOC_LIMIT_MAX

The maximum that old_malloc_limit can reach. The default is 128 * 1024 * 1024 (128MB).

ObjectSpace class for tracking memory leaks

Ruby 2.1 adds several tools for keeping track of situations where we leave links to old / large objects, thus preventing them from being accessed by the garbage collector.

Now we have a set of methods for determining the placement of an object:

 require "objspace" module Example class User def initialize(first_name, last_name) @first_name, @last_name = first_name, last_name end def name "#{@first_name} #{@last_name}" end end end ObjectSpace.trace_object_allocations do obj = Example::User.new("Arthur", "Dent").name ObjectSpace.allocation_sourcefile(obj) #=> "example.rb" ObjectSpace.allocation_sourceline(obj) #=> 10 ObjectSpace.allocation_class_path(obj) #=> "Example::User" ObjectSpace.allocation_method_id(obj) #=> :name ObjectSpace.allocation_generation(obj) #=> 6 end

The number returned by the allocation_generation method is the number of garbage collections that passed at the time the object was created. Thus, if this number is small, the object was created at the time when the application started.

There are also trace_object_allocations_start and trace_object_allocations_stop methods as an alternative to using trace_object_allocations with block transmission, and the trace_object_allocations_clear method to reset the recorded location data of objects.

In addition, you can output this and not only information to a file or string in JSON format for further analysis or visualization.

 require "objspace" ObjectSpace.trace_object_allocations do puts ObjectSpace.dump(["foo"].freeze) end

will lead

 { "address": "0x007fd122123f40", "class": "0x007fd121072098", "embedded": true, "file": "example.rb", "flags": { "wb_protected": true }, "frozen": true, "generation": 6, "length": 1, "line": 4, "references": [ "0x007fd122123f68" ], "type": "ARRAY" }

It is also possible to use the ObjectSpace.dump_all method to get information about all the memory on the heap.

 require "objspace" ObjectSpace.trace_object_allocations_start # do things ... ObjectSpace.dump_all(output: File.open("heap.json", "w"))

Both of these methods can be used without activating object allocation tracing, but less information will be obtained.

Finally, there is the ObjectSpace.reachable_objects_from_root method, which is similar to the ObjectSpace.reachable_objects_from method, but it takes no arguments and works from the application root. This method has one feature - it returns a hash, which, when accessing by keys, compares them for identity, so for access you need not just the same lines, but exactly the same objects with the same object_id as in the hash itself. Fortunately, you can get around this:

 require "objspace" reachable = ObjectSpace.reachable_objects_from_root reachable = {}.merge(reachable) # workaround compare_by_identity reachable["symbols"] #=> ["freeze", "inspect", "intern", ...

The first part of this is completed, the next will be in a couple of days, it will be refinements, new rubygems, changes in lambdas and much more.

The second part The third part

Source: https://habr.com/ru/post/222941/

All Articles

Ruby 2.1 in detail (Part 1)

New version control policy

Required Named Arguments

String # freeze method optimization

def returns the method name as a Symbol object.

Literals for rational and complex numbers

Array / Enumerable #to_h

Split method caching

Exceptions

Generation Garbage Collector

Class GC

GC - setting environment variables

RUBY_GC_HEAP_INIT_SLOTS

RUBY_GC_HEAP_FREE_SLOTS

RUBY_GC_HEAP_GROWTH_FACTOR

RUBY_GC_HEAP_GROWTH_MAX_SLOTS

RUBY_GC_MALLOC_LIMIT

RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR

RUBY_GC_MALLOC_LIMIT_MAX

RUBY_GC_OLDMALLOC_LIMIT

RUBY_GC_OLDMALLOC_LIMIT_GROWTH_FACTOR

RUBY_GC_OLDMALLOC_LIMIT_MAX

ObjectSpace class for tracking memory leaks

More articles: