📜 ⬆️ ⬇️

Ruby 2.0 Lazy Enumerable

Recently, my Enumerable::Lazy patch has been accepted into ruby ​​trunk. This means that in ruby ​​2.0 we can:
 a = [1,2,3,4,2,5].lazy.map { |x| x * 10 }.select { |x| x > 30 } #=>    a.to_a #=> [40, 50],     . 

What for?


Ruby is beautiful, and, being an imperative language, it nevertheless allows us to write elegant "functional" code. For example:
 data.map(&:split).map(&:reverse) 

Looks much more readable than:
 data.map { |l| l.split.reverse } 

However, the first example is inefficient: transforming data , Ruby passes through the data twice, generating an intermediate array. As long as the amount of data is small, this deficiency is not critical. But suppose we parse a huge file:
 File.open("text") do |f| f.each.flat_map(&:split).grep(/ruby/) end 

In this case, double pass through the array is not allowed. Fortunately, with the advent of Enumerable::Lazy this is no longer a problem. Lazy versions of the map , select and many other functions, redefined in the Enumerable::Lazy class, do not perform the calculation immediately. Instead, they return an instance of Enumerable::Lazy , allowing you to build chains of lazy evaluation.

Update

Now, when most of the work on Enumerator :: Lazy is finished, I decided to test what kind of gain it actually gives. The results were disappointing (for the first time with a benchmark I was mistaken - hence the optimistic comment from below). On average, the lazy Enumerator is 4 (!) Times slower than operations on ordinary arrays. This is due to the fact that when calculating a lazy chain, the operation to call a block occurs too often. Watch the discussion at ruby-lang.org . Despite this - the benefits of a lazy enumerator are the same - this is an elegant and readable code, while due to the slowness, the number of user cases is reduced.
A good example was given by Yusuke Endoh:
 a = [] Prime.each do |x| next if x % 4 != 3 a << x break if a.size == 10 end 

using laziness turns into:
 Prime.lazy.select {|x| x % 4 == 3 }.take(10).to_a 

When?


It’s hard to say who was the first to figure out how to use lazy calculations in Ruby with Enumerator . Perhaps this post in 2008 was one of the first. The idea of ​​implementation is simple and based on the remarkable property of the Enumerator objects — they can be chained together. Since then, many articles have been published , and comments have been added to enumerator.c with explanations of the idea. The debate on ruby-lang lasted more than three years, and finally Matz voted to implement Yataka Hara .
It proposed the method Enumerable::lazy , which "wraps" the array into an instance of Enumerable::Lazy , which, in turn, is used to build a lazy chain. A patch request for C was voiced and I immediately accepted the call. By the way, I recently became interested in functional programming, and to dig into the insides of Ruby was very interesting. As a result, I was honored to request pool , which, after a minor refactoring, was adopted a few days ago. Vmerdzhili it to the trunk and it will be released in Ruby 2.0 ( see roadmap ).

How?


Enumerator (skip if you know)

For those who do not know what an ordinary Enumerator is and what it is eaten with:
 enum = [1, 2].each puts enum.next #=> 1 puts enum.next #=> 2 puts enum.next #=> StopIteration exception raised enum = Enumerator.new do |yielder| yielder << 1 yielder << 2 end puts enum.next #=> 1 puts enum.next #=> 2 puts enum.next #=> StopIteration exception raised enum = "xy".enum_for(:each_byte) enum.each { |b| puts b } # => 120 # => 121 o = Object.new def o.each yield yield 'hello' yield [1, 2] end enum = o.to_enum p enum.next #=> nil p enum.next #=> 'hello' p enum.next #=> [1, 2] enum = %w{foo bar baz}.map puts enum.with_index { |w, i| "#{i}:#{w}" } # => ["0:foo", "1:bar", "2:baz"] #     a = [1, 2, 3] some_method(a.enum_for) [1,2,3].cycle.take(10) #=> [1, 2, 3, 1, 2, 3, 1, 2, 3, 1] 

As you may have guessed, the #to_enum and #enum_for are in the Kernel module, which means they are available to any object. Examples are taken from enumerator.c , if you can look at it interestingly here: test / ruby ​​/ test_enumerator.rb . The enumerator's internal guitars probably deserve a separate post, but it is worth noting that all this magic with the next implemented with the help of Fiber.
')
Lazy enumerator

To understand how Enumerable :: Lazy works, just look at this code:
 module Enumerable class Lazy < Enumerator def initialize(obj) super() do |yielder| obj.each do |val| if block_given? yield(yielder, val) else yielder << val end end end end def map Lazy.new(self) do |yielder, val| yielder << yield(val) end end end end a = Enumerable::Lazy.new([1,2,3]) a = a.map { |x| x * 10 }.map { |x| x - 1 } puts a.next #=> 9 puts a.next #=> 19 

This is a typical Ruby implementation proposed by Yutaka . But pay attention - in this example I do not use &block as a parameter of the method (to convert to Proc and call a call ). Instead, we can yield it right inside the block for each ! block_given? also works as it should. Moreover, being inside a block, we can still call self (like return ). Wow, isn't it? A wonderful post on this topic from Yehuda Katz ( another one ).
The code itself is quite simple, but let's look at it in more detail: As already mentioned, the main idea is in building a chain. Therefore, Enumerable::Lazy#map creates a new instance of Enumerable::Lazy passing it the "previous" object as an argument. When you call to_a ( next , each with a block, take , etc.), the calculation is performed: Ruby goes back to the beginning of the chain (Fig. 1), gets the next value from the original object and returns it to the outside, sequentially modifying the lazy methods ( in the same order in which they were "hung").

Fig.1 Chain lazy enumerators.

C patch

My patch on C completely repeats implementation on Ruby. Except for the fact that instead of calling super directly inside lazy_initialize I create and initialize a generator ( Enumerator::Generator ), which is passed as an argument to the enumerator.
In the final patch, nobu slightly refactored the code: he divided the if-else condition inside the function block into two separate methods ( lazy_init_block_i and lazy_init_block ), and transferred the conditions to lazy_initialize .
Also, instead of passing Ruby an array as a parameter to a function block, it constructs a regular C array with parameters. Accordingly, it is no longer necessary to use rb_ary_entry to get the yielder and the value inside the block.
For example, this:
 static VALUE lazy_map_func(VALUE val, VALUE m, int argc, VALUE *argv) { VALUE result = rb_funcall(rb_block_proc(), id_call, 1, rb_ary_entry(val, 1)); return rb_funcall(rb_ary_entry(val, 0), id_yield, 1, result); } 

turned into this:
 static VALUE lazy_map_func(VALUE val, VALUE m, int argc, VALUE *argv) { VALUE result = rb_yield_values2(argc - 1, &argv[1]); return rb_funcall(argv[0], id_yield, 1, result); } 

Frankly, I was an absolute newcomer to Ruby's guts, so it took two weekends to write this rather trivial pull request. Moreover, at first I tried to go the other way: trying to save blocks of lazy methods as in the enumerator itself. The first crazy pull rekvest watch here . When Enumerator#next was called, all the processes were "applied" one by one over the next value. The lazy map and select worked fine, but when I tried to correct Enumerator#each , I realized that somehow it turns out too difficult (or not?).
You have a hard nut to get here, so if you plan to patch something in Ruby, there are a lot of great articles . As a bonus, another article about why we should be lazy.

findings


Now Enumerator::Lazy has in its arsenal select , map , reject , grep (added by the first patch), flat_map , zip , take , take_while , drop , drop_while , cycle (recently added shugo ). The development and discussion of the API is actively continuing, but if you want to get lazy right now, all you need to do is compile Ruby from the trunk and enjoy it.

Note:

PS This is a translation of my article from English.

Source: https://habr.com/ru/post/140362/


All Articles