📜 ⬆️ ⬇️

Distributed computing in Elixir - classic MapReduce example

Distributed computing in Elixir


Elixir and Erlang are ideally suited for building distributed applications that run multiple, possibly similar tasks in parallel. The support of many competitive processes working in isolation was one of the main aspects in the development of the Erlang virtual machine.


We will try to test this opportunity to use the potential of a multi-core processor with a simple example. Let us calculate how many times the word "horse" is met in the stories of the writer O. Henry placed in text files in one directory. Technically, we will count the number of occurrences of a sequence of characters "horse", not words, and only in lower case.



Counts occurrences of substrings in files


Let's start with the function of counting the number of occurrences of a substring in the contents of a text file.


word_count = fn(file, word) -> {:ok, content} = File.read(file) length(String.split(content, word)) - 1 end 

Read the contents of the file and return the number of references to the word Error handling is omitted for simplicity.


Add a delay of 1 second to the function, and also display the result of the calculation in the console before returning it.


 word_count = fn(file, word) -> :timer.sleep(1000) {:ok, content} = File.read(file) count = length(String.split(content, word)) - 1 IO.puts "Found #{inspect count} occurrence(s) of the word in file #{inspect file}" count end 

Now we calculate the number of substrings in each file and display the amount.


 Path.wildcard("/data/OGENRI/*.txt") |> Enum.map(fn(file) -> word_count.(file, "") end) |> Enum.reduce(fn(x, acc) -> acc + x end) |> IO.puts 

And at the same time measure the time of the entire program.


 # sync_word_count.exs start_time = :os.system_time(:milli_seconds) word_count = fn(file, word) -> :timer.sleep(1000) {:ok, content} = File.read(file) count = length(String.split(content, word)) - 1 IO.puts "Found #{inspect count} occurrence(s) of the word in file #{inspect file}" count end Path.wildcard("/data/OGENRI/*.txt") |> Enum.map(fn(file) -> word_count.(file, "") end) |> Enum.reduce(fn(x, acc) -> acc + x end) |> IO.puts end_time = :os.system_time(:milli_seconds) IO.puts "Finished in #{(end_time - start_time) / 1000} seconds" 

I have 12 files in total and I had to wait about 12 seconds, second by second contemplating how the result of counting for each file appears on the monitor.


 iex sync_word_count.exs Erlang/OTP 18 [erts-7.3] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false] [dtrace] Found 0 occurrence(s) of the word in file "/data/OGENRI/businessmen.txt" Found 1 occurrence(s) of the word in file "/data/OGENRI/choose.txt" Found 0 occurrence(s) of the word in file "/data/OGENRI/four.txt" Found 1 occurrence(s) of the word in file "/data/OGENRI/light.txt" Found 10 occurrence(s) of the word in file "/data/OGENRI/prevr.txt" Found 0 occurrence(s) of the word in file "/data/OGENRI/r_dl.txt" Found 1 occurrence(s) of the word in file "/data/OGENRI/r_linii.txt" Found 10 occurrence(s) of the word in file "/data/OGENRI/r_sixes.txt" Found 9 occurrence(s) of the word in file "/data/OGENRI/serdce.txt" Found 0 occurrence(s) of the word in file "/data/OGENRI/stihi.txt" Found 0 occurrence(s) of the word in file "/data/OGENRI/voice.txt" Found 0 occurrence(s) of the word in file "/data/OGENRI/ways.txt" 32 Finished in 12.053 seconds Interactive Elixir (1.3.1) - press Ctrl+C to exit (type h() ENTER for help) 

Asynchronous task execution


To count the number of occurrences of a substring, we will asynchronously use the spawn (generate) process creation method and the send and receive methods to send and receive a message, respectively.


We will create a separate process for each file.


 async_word_count = fn(file, word) -> caller = self spawn(fn -> send(caller, {:result, word_count.(file, word)}) end) end 

self is the current process. Create a caller variable with the same value as self . The child process calls the word_count/2 function and sends the result back to the parent process.


To get a value in the parent process, you need to use receive (as many times as there are processes). Create a get_result/0 method for this.


 get_result = fn -> receive do {:result, result} -> result end end 

Update the program.


 # async_word_count.exs start_time = :os.system_time(:milli_seconds) word_count = fn(file, word) -> :timer.sleep(1000) {:ok, content} = File.read(file) count = length(String.split(content, word)) - 1 IO.puts "Found #{inspect count} occurrence(s) of the word in file #{inspect file}" count end async_word_count = fn(file, word) -> caller = self spawn(fn -> send(caller, {:result, word_count.(file, word)}) end) end get_result = fn -> receive do {:result, result} -> result end end Path.wildcard("/data/OGENRI/*.txt") |> Enum.map(fn(file) -> async_word_count.(file, "") end) |> Enum.map(fn(_) -> get_result.() end) |> Enum.reduce(fn(x, acc) -> acc + x end) |> IO.puts end_time = :os.system_time(:milli_seconds) IO.puts "Finished in #{(end_time - start_time) / 1000} seconds" 

Check it out.


 iex async_word_count.exs Erlang/OTP 18 [erts-7.3] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false] [dtrace] Found 9 occurrence(s) of the word in file "/data/OGENRI/serdce.txt" Found 0 occurrence(s) of the word in file "/data/OGENRI/businessmen.txt" Found 0 occurrence(s) of the word in file "/data/OGENRI/four.txt" Found 1 occurrence(s) of the word in file "/data/OGENRI/choose.txt" Found 1 occurrence(s) of the word in file "/data/OGENRI/light.txt" Found 10 occurrence(s) of the word in file "/data/OGENRI/prevr.txt" Found 1 occurrence(s) of the word in file "/data/OGENRI/r_linii.txt" Found 10 occurrence(s) of the word in file "/data/OGENRI/r_sixes.txt" Found 0 occurrence(s) of the word in file "/data/OGENRI/stihi.txt" Found 0 occurrence(s) of the word in file "/data/OGENRI/voice.txt" Found 0 occurrence(s) of the word in file "/data/OGENRI/ways.txt" Found 0 occurrence(s) of the word in file "/data/OGENRI/r_dl.txt" 32 Finished in 1.014 seconds Interactive Elixir (1.3.1) - press Ctrl+C to exit (type h() ENTER for help) 

Conclusion


Adventures and horses used to be inseparable from each other, now, perhaps, this is not quite so.


Links


» Http://elixir-lang.org/getting-started/processes.html
» Http://culttt.com/2016/07/27/understanding-concurrency-parallelism-elixir/
» Https://elixirschool.com/lessons/advanced/concurrency/
» Code and text files (OGENRI folder)


')

Source: https://habr.com/ru/post/310978/


All Articles