📜 ⬆️ ⬇️

Writing celestial bees on Cloud Haskell


Hi, Habr!

Only 11577635 seconds have passed since the end of GoTo's fall school at ITMO. The week of the direction of Distributed Systems began with the prototyping of a distributed system on Cloud Haskell. We started cheerfully and quickly found out that it was difficult to understand the existing documentation without PhD - and decided to write a training manual.

Under the cat, an introduction to p2p cloud haskell, a slightly functional PC prototyping stack, motivation and “but why”.



Suppose you want to do something distributed (say% sCoin) that is not covered by well-existing systems (YARN does not answer all the questions). If you start doing everything with your hands, you can quickly find a huge number of problems - from multiplexing connections and encryption to breaking through NAT and peer routing, which you really don't want to solve (not for the first time in human history), especially if the goal is a final product or a beautiful working prototype.
')
Any application programmer from such a problem statement will quickly come up with the word "library". And really. You can take discovery and routing bits from, for example, Kademlia, standard NAT breakdown mechanisms — STUN, TURN, ICE — in general, are also known; etc.

But it will still require a lot of time and expertise. Investors patience may not be enough.

Here, the more experienced colleagues will come up with the idea: “we need a framework!”. And true. For Prototyping Distributed Systems and Applications.

And someone will even say: bs , this is libp2p ! And he will be right. Partially.

libp2p solves the problem of transport, its multiplexing and encryption, discovery, peer routing, NAT punching, connection upgrade, etc. - in general, many network and crypto requirements of distributed applications. On Go and JS.

This is a great framework, but it has a couple of problems. This is Go and JS. In addition, it would be nice to have something in the framework for replication.



The Haskell Cloud

http://www.scs.stanford.edu/14sp-cs240h/projects/joshi.pdf , paraphrased

Our project began with an ambition to make a blockchain (excuse me, innovation) on Haskell - so we did not have libp2p - and in four days. We started looking for something that would make the network (transport, discovery, serialization) for us. Found Cloud Haskell . Found that the documentation is rather complicated. We decided to write your own introduction. So:

Writing celestial bees on Cloud Haskell


In the example, we will write a system of bees: there is a hive — a cluster of machines, and bees — a node (machine). The bees go to search for flowers to search for and return to the hive with the coordinates of tasty flowers, and all other bees should learn about these coordinates.

It is not at all necessary for you to run the program on several computers - a laptop on which we will simultaneously launch our program is enough.

The full code is in the repository .

Cloud Haskell works on the principle of exchanging messages between nodes (such a model is called message passing ), because nodes do not share the common resource space (RAM, ...) - the shared state model is not easy to use. Actor Model is a particular example of a message passing model, when messages are sent by actors to other actors and receive messages in their mailbox - as message passing looks like in Haskell Cloud.



1. First, let's define the types of data whose representatives the bees will exchange: src / Types.hs

type Flower = (Int, Int) --   type Flowers = GSet Flower -- Grow-Only Set  


2. We proceed to the implementation of the node, which we will later run from the command line: app / Main.hs

 main = do --   [port, bootstrapPort] <- getArgs -- (1)     bootstrap      let hostName = "127.0.0.1" -- IP  P2P.bootstrap --       : hostName port --   (\port -> (hostName, port)) initRemoteTable -- (2)  remote table [P2P.makeNodeId (hostName ++ ":" ++ bootstrapPort)] --    bootstrap  spawnNode --    ,      

  1. To start, the nodes must somehow learn about each other, that is, perform peer discovery . In Cloud Haskell there is an out-of-box solution — when initializing a node, it’s enough for us to specify at least one other bootstrap node: the node performs with the bootstrap node of the Peer Exchange - they exchange the addresses of the node they know (aka peers ).
  2. Remote table is a thing that allows peers to exchange haskell types, if they support serialization , that is, they can be represented in a format that can be sent over the network and restored back to a Haskell object. A type supports serialization if it implements the class (Binary a, Typeable) => Serializable a time class (Binary a, Typeable) => Serializable a . You don’t have to invent the Serializable implementation yourself, Binary and Typeable - haskell does it for you (using the automatic deriving magic mechanism):

     {-# LANGUAGE DeriveDataTypeable #-} {-# LANGUAGE DeriveGeneric #-} --  ,    Binary data Example = Example deriving (Typeable, Generic) instance Binary Example 
    Next we will omit deriving ... , instance Binary binary and pragmas for the sake of brevity.

3. Now we write the startup logic for the node:

 spawnNode :: Process () -- (1)     spawnNode = do liftIO $ threadDelay 3000000 --  bootstrap     let flowers = S.initial :: Flowers --  GSet     self <- getSelfPid -- (3)   Pid  REPL     repl <- spawnLocal $ runRepl self -- (2)  REPL    register "bees" self --        "bees" spawnLocal $ forever $ do -- (3)  : send self Tick --          liftIO $ threadDelay $ 10^6 --  0.1   ,     runNode (NodeConfig repl) flowers -- (5)   

  1. In Cloud Haskell, the main functional unit is Process (not to be confused with the OS process). They are based on lightweight green streams and can send messages to other processes ( send function to send to a specific process or P2P.nsendPeers to send to all familiar nodes), receive messages to their mailbox ( expect or receive* function), start other processes (for example, locally with using spawnLocal ), etc.
  2. We need to implement the REPL in a separate thread, otherwise the main thread (node) will be blocked, therefore we need to make a thread-safe interface for GSet so that it can be changed for both the REPL and the node. Since the system is based on actors, we will send messages to change the set and process them sequentially in an infinite loop of processing messages in the main thread.
  3. We run the REPL as a separate cloud-haskell process (i.e., as a green stream), and also pass it the main process Pid (unique process identifier) ​​so that the REPL knows where to send commands entered by the user as messages. Next we get Pid REPL (spawnLocal returns spawnLocal ) to send it answers to commands. The REPL code is here .
  4. How will flower replication work?
    • Each node will periodically send its status to all peers ( broadcast ) - and this together with CRDT solves the replication problem:
      Let there be nodes A and B Suppose A does not have an element x , while B x has one. After B makes a broadcast, A will add x - a consensus has been reached, r.t.
      If we had an ordinary set, but not GSet, then nothing would have happened: Suppose A and B have an element y . Let A remove y . After B makes a broadcast, A will get y back.
    • When we send a message to all the nodes, we must specify the name of the service - in fact, we send a message only to those nodes that have registered themselves in the register as supporting this service. Here we register our node as a supporting service “bees”: register "bees" self .
    • Noda should know when to send others their status. The simplest solution is to do it on a timer: wait a second, and then act, but then we would block the main message processing flow. Here we start the process via spawnLocal , which first sends a Tick message to the main process (when the main process sees the Tick, it sends its state to the nodes), and then waits for 1 second and repeats.

4. Ok, now (finally!) We can proceed to the logic of the main process - node execution code:

 runNode :: NodeConfig -> Flowers -> Process () -- (1)    runNode config@(NodeConfig repl) flowers = do let run = runNode config receiveWait -- (2)   [ match (\command -> -- (3)    -  Command  REPL,  newFlowers <- handleReplCommand config flowers --     run newFlowers) , match (\Tick -> do --   ,        P2P.nsendPeers "bees" flowers --     run flowers) , match (\newFlowers -> do -- -    run $ newFlowers `union` flowers) --     -     ] 

  1. Let's look at the signature: runNode accepts the runNode node configuration — information that will not change during execution. In our case, this is just a Pid REPL. She also takes her current state - GSet flowers. But how to add a flower, because GSet is an immutable data type? Very simple: let's make our function recursive, and with each change of state we will start it again.
  2. receiveWait accepts a list of functions with one argument (incoming message), pulls out a message, and calls a function that matches the message type.
  3. If we received a message of this type: data Command = Add Flower | Show data Command = Add Flower | Show , this is the REPL command. handleReplCommmand - the function to process the command:

     handleReplCommand :: NodeConfig -> Flowers -> Command -> Process Flowers handleReplCommand (NodeConfig repl) flowers (Add flower) = do --      send repl (Added flower) --  REPL ,    return $ S.add flower flowers --       handleReplCommand (NodeConfig repl) flowers Show = do --    send repl (HereUR $ toList flowers) --      return flowers 
  4. If Tick comes from a ticker, then you need to send your status: P2P.nsendPeers "bees" flowers . Here “bees” is the name of the service, that is, we send flowers only to those nodes that have registered themselves as “bees”.
  5. If we received flowers from some other bee, we need to add all unfamiliar flowers to ourselves, that is, simply combine many new ones with many existing ones.

5. That's it! Download the full source code and compile:

 git clone https://github.com/SenchoPens/cloud-bees.git cd cloud-bees stack setup # Stack  GHC stack build #  

Now run this line in one terminal:

 stack exec cloud-bees-exe 9000 9001 2>/dev/null 

And in another this:

 stack exec cloud-bees-exe 9001 9000 2>/dev/null 

REPL will display the prompt. Try adding Add (1, 2) in one terminal, i.e. add a flower with coordinates (1, 2), and in another - Show , and you will see that the second node now has such a flower.




I hope we convinced you that creating Haskell distributed systems is not so scary :)

You can come up with a lot of real juz-cases for a similar system: for example, solving the problem of hares in public transport: a person passing through the transport on the card is marked as entered (add his id to the first GSet), and exit - as released (add the id in second GSet). At night (when the transport is not working) a check occurs - if the person entered and left, he is not a hare.

If you're interested, you can see our more extensive project with encryption, which we did during the shift .



With love, Arseny and company, Grade 9; under the gentle guidance of wldhx

Source: https://habr.com/ru/post/351496/


All Articles