How to make a distributed assembly system from Ninja?

Hi, Habr!

Recently, I thought about picking up another free build system, “Is it possible to take and write such a system yourself? It's simple - take the same Ninja, fasten the separation to preprocessing and compiling, and transfer files over the network to and fro. How much easier? "

Simple - not easy, how to make such a system yourself - I will tell you under the cut.

Stage 0. Task statement

Disclaimer: The article is marked as a tutorial, but this is not a step-by-step tutorial, copying the code from which the finished product will turn out. It is rather an instruction - how to plan and where to dig.
')
First, we define what the general algorithm of the work should be:

We read the graph of the assembly, isolate the compilation commands;
We split the compilation into two stages, preprocessing and the actual code generation. We mark the last one as possible for remote execution;
We perform preprocessing, read the result into memory;
We send the preprocessed file and the command to generate code to another host over the network;
We execute the code generation command, read the object file and give it as a response over the network;
The resulting object file is saved to disk and the compiler messages are output to the console.

It seems not so scary, right? But just for the evening to write all this, perhaps, will not work. At first we will write several prototypes, and the article tells about them:

Prototype 1. The program mimics the compiler, dividing the command into 2, and self-invoking the compiler.
Prototype 2. To this, add the transfer command to compile over the network, without the file itself.
Prototype 3. Go through the Ninja assembly graph, displaying potentially broken commands.

It is recommended to develop a prototype for a POSIX-compatible OS, if you do not use libraries.

Stage 1. We split the command line

For the prototype, let's stop on the GCC compiler (or Clang, there is not much difference), since its command line is easier to disassemble.

Let our program be called through the command "test -c hello.cpp -o hello.o". We assume that after the key "-c" (compilation into object code) there is always the name of the input file, although this is not the case. Also, for the time being, we’ll stop only at work in a local directory.

We will use the popen function to start the process and get standard output. The function allows you to open the process in the same way as we would open the file.

Main.cpp file:

#include <iostream> #include "InvocationRewriter.hpp" #include "LocalExecutor.hpp" int main(int argc, char ** argv) { StringVector args; for (int i = 1; i < argc; ++i) args.emplace_back(argv[i]); InvocationRewriter rewriter; StringVector ppArgs, ccArgs; //      . if (!rewriter.SplitInvocation(args, ppArgs, ccArgs)) { std::cerr << "Usage: -c <filename> -o <filename> \n"; return 1; } LocalExecutor localExecutor; const std::string cxxExecutable = "/usr/bin/g++"; // ,     GNU/Linux. const auto ppResult = localExecutor.Execute(cxxExecutable, ppArgs); if (!ppResult.m_result) { std::cerr << ppResult.m_output; return 1; } const auto ccResult = localExecutor.Execute(cxxExecutable, ccArgs); if (!ccResult.m_result) { std::cerr << ccResult.m_output; return 1; } //   ,    ,   . return 0; }

InvocationRewriter.hpp code

 #pragma once #include <string> #include <vector> #include <algorithm> using StringVector = std::vector<std::string>; class InvocationRewriter { public: bool SplitInvocation(const StringVector & original, StringVector & preprocessor, StringVector & compilation) { //     -c  -o. //  ,   -c     ,     . const auto cIter = std::find(original.cbegin(), original.cend(), "-c"); const auto oIter = std::find(original.cbegin(), original.cend(), "-o"); if (cIter == original.cend() || oIter == original.cend()) return false; const auto cIndex = cIter - original.cbegin(); const auto oIndex = oIter - original.cbegin(); preprocessor = compilation = original; const std::string & inputFilename = original[cIndex + 1]; preprocessor[oIndex + 1] = "pp_" + inputFilename; //     preprocessor[cIndex] = "-E"; //   - . compilation[cIndex + 1] = "pp_" + inputFilename; return true; } };

LocalExecutor.hpp code

 #pragma once #include <string> #include <vector> #include <algorithm> #include <stdio.h> using StringVector = std::vector<std::string>; class LocalExecutor { public: ///   :   +  struct ExecutorResult { std::string m_output; bool m_result = false; ExecutorResult(const std::string & output = "", bool result = false) : m_output(output), m_result(result) {} }; ///     popen. ExecutorResult Execute(const std::string & executable, const StringVector & args) { std::string cmd = executable; for (const auto & arg : args) cmd += " " + arg; cmd += " 2>&1"; //  sterr  stdout. FILE * process = popen(cmd.c_str(), "r"); if (!process) return ExecutorResult("Failed to execute:" + cmd); ExecutorResult result; char buffer[1024]; while (fgets(buffer, sizeof(buffer)-1, process) != nullptr) result.m_output += std::string(buffer); result.m_result = pclose(process) == 0; return result; } };

Well, now we have a small compiler emulator that jerks a real compiler. We go further :)

Further development of the prototype:

Include absolute file names;
Use one of the libraries for working with processes: Boost.Process, QProcess, or Ninja Subprocess;
Implement command sharing support for MSVC;
Make the API to execute commands asynchronous, and execute in a separate thread.

Stage 2. Network subsystem

Prototype network exchange will do on BSD Sockets ( Berkeley Sockets )

A bit of theory:

A socket is literally a “hole” in which data can be written and read from it. To connect to a remote server, the algorithm is as follows:

Create a socket of the desired type (TCP) using the function socket ();
After creation, set the necessary flags, for example, non-blocking mode using setsockopt ();
Get the address in the correct format for BSD sockets using getaddrinfo ();
Connect to a TCP host using the connect () function, passing the prepared address there;
Call read / send functions for reading and writing;
After finishing the work, call close ().

The server works a little harder:

Create a socket using the function socket ();
Set options;
Call bind () to bind a socket to a specific address (obtained through getaddrinfo)
We start listening on the port by calling listen ();
Incoming connections are accepted by the accept () function - it returns us a new socket;
With the received socket, perform read / write operations;
Close the connection socket and the listening socket through close ().

We need a socket client and a socket server. Let their interface look like this:

 ///   class IDataSocket { public: using Ptr = std::shared_ptr<IDataSocket>; ///    . Success- , TryAgain -      , Fail -   . enum class WriteState { Success, TryAgain, Fail }; enum class ReadState { Success, TryAgain, Fail }; public: virtual ~IDataSocket() = default; ///     virtual bool Connect () = 0; ///   virtual void Disconnect () = 0; ///    - ;   virtual bool IsConnected () const = 0; virtual bool IsPending() const = 0; ///       virtual ReadState Read(ByteArrayHolder & buffer) = 0; ///    . virtual WriteState Write(const ByteArrayHolder & buffer, size_t maxBytes = size_t(-1)) = 0; }; ///  "".      . class IDataListener { public: using Ptr = std::shared_ptr<IDataListener>; virtual ~IDataListener() = default; ///    virtual IDataSocket::Ptr GetPendingConnection() = 0; ///   : virtual bool StartListen() = 0; };

I will not embed the implementation of this interface into the article, you can do it yourself or peep here .

Suppose we have a socket ready, what will the client and server compiler look like?

Server:

 #include <TcpListener.h> #include <algorithm> #include <iostream> #include "LocalExecutor.hpp" int main() { //    . TcpConnectionParams tcpParams; tcpParams.SetPoint(6666, "localhost"); //     6666; auto listener = TcpListener::Create(tcpParams); IDataSocket::Ptr connection; //    ; while((connection = listener->GetPendingConnection()) == nullptr) ; //      . connection->Connect(); ByteArrayHolder incomingBuffer; //!<    std::vector<uint8_t>; while (connection->Read(incomingBuffer) == IDataSocket::ReadState::TryAgain) ; // ,       ,  . std::string args((const char*)(incomingBuffer.data()), incomingBuffer.size()); std::replace(args.begin(), args.end(), '\n', ' '); LocalExecutor localExecutor; const auto result = localExecutor.Execute("/usr/bin/g++", StringVector(1, args)); std::string stdOutput = result.m_output; if (stdOutput.empty()) stdOutput = "OK\n"; //   -    ,    OK. //       . ByteArrayHolder outgoingBuffer; std::copy(stdOutput.cbegin(), stdOutput.cend(), std::back_inserter(outgoingBuffer.ref())); connection->Write(outgoingBuffer); connection->Disconnect(); //    ,       . //     /      . return 0; }

Customer:

 #include <iostream> #include <TcpSocket.h> #include "InvocationRewriter.hpp" #include "LocalExecutor.hpp" int main(int argc, char ** argv) { StringVector args; for (int i = 1; i < argc; ++i) args.emplace_back(argv[i]); InvocationRewriter rewriter; StringVector ppArgs, ccArgs; //      . if (!rewriter.SplitInvocation(args, ppArgs, ccArgs)) { std::cerr << "Usage: -c <filename> -o <filename> \n"; return 1; } LocalExecutor localExecutor; const std::string cxxExecutable = "/usr/bin/g++"; // ,     GNU/Linux. const auto ppResult = localExecutor.Execute(cxxExecutable, ppArgs); if (!ppResult.m_result) { std::cerr << ppResult.m_output; return 1; } //      6666 TcpConnectionParams tcpParams; tcpParams.SetPoint(6666, "localhost"); auto connection = TcpSocket::Create(tcpParams); connection->Connect(); ByteArrayHolder outgoingBuffer; for (auto arg : ccArgs) { arg += " "; //       . std::copy(arg.cbegin(), arg.cend(), std::back_inserter(outgoingBuffer.ref())); } connection->Write(outgoingBuffer); ByteArrayHolder incomingBuffer; while (connection->Read(incomingBuffer) == IDataSocket::ReadState::TryAgain) ; std::string response((const char*)(incomingBuffer.data()), incomingBuffer.size()); if (response != "OK\n") { std::cerr << response; return 1; } return 0; }

Yes, not all sources are shown, for example, TcpConnectionParams or ByteArrayHolder, but these are fairly primitive structures.

After debugging this prototype, we have a small service that can compile preprocessed files locally (with some assumptions, for example, that the working directory of the client and server are the same).

Further development of the prototype:

I strongly recommend using one of the existing network libraries - Boost.Asio, QTcpSocket (QtNetwork), and also think about serialization using Protobuf or other similar ones.
Implement file transfer over the network. Most likely, you will have to break them into fragments, but will depend on the library you choose.
You need to think about the asynchronous message sending and receiving API. In addition, it is desirable to make it abstract and not bound to sockets in general.

Stage 3. Integration with Ninja

To get started, you need to familiarize yourself with the principles of Ninja. It is assumed that you have already collected any projects with its help and roughly imagine what build.ninja looks like.
Used concepts:

A node (Node) is just a file. Input (source), output (object files) are all nodes or vertices of the graph.
A rule (Rule) is essentially just a command with a pattern of arguments. For example, a call to gcc is a rule, and its arguments are $ FLAGS $ INCLUDES $ DEFINES and some other general arguments.
Edge. It was a little surprising for me, but the edge connects not two nodes, but several input nodes and one output node, through the Rule. The entire assembly system is based on what consistently bypasses the graph, executing commands for the edges. Once all the edges are processed, the project is assembled.
A state is a container with all of the above, which the build system uses.

How it looks about, if you draw dependencies:

This shows the assembly graph for two translation units that are assembled into an application.

As we can see, in order to make our changes to the build system, we need to rewrite the State, breaking Edges into two in the right places and adding new nodes (preprocessed files).
Suppose we already have ninja sources, we compile them, and everything works in assembled form.
Add the following code snippet to ninja.cc:

  // Limit number of rebuilds, to prevent infinite loops. const int kCycleLimit = 100; for (int cycle = 1; cycle <= kCycleLimit; ++cycle) { NinjaMain ninja(ninja_command, config); ManifestParser parser(&ninja.state_, &ninja.disk_interface_, options.dupe_edges_should_err ? kDupeEdgeActionError : kDupeEdgeActionWarn); string err; if (!parser.Load(options.input_file, &err)) { Error("%s", err.c_str()); return 1; } //    ,   : RewriteStateRules(&ninja.state_); //

The function RewriteStateRules itself can be carried in a separate file, or declared here in ninja.cc as:

 #include "InvocationRewriter.hpp" // ,     Ninja. struct RuleReplace { const Rule* pp; const Rule* cc; std::string toolId; RuleReplace() = default; RuleReplace(const Rule* pp_, const Rule* cc_, std::string id) : pp(pp_), cc(cc_), toolId(id) {} }; void RewriteStateRules(State *state) { //   , ..    ,      . const auto rules = state->bindings_.GetRules(); std::map<const Rule*, RuleReplace> ruleReplacement; InvocationRewriter rewriter; //      for (const auto & ruleIt : rules) { const Rule * rule = ruleIt.second; const EvalString* command = rule->GetBinding("command"); if (!command) continue; //     rewriter-. std::vector<std::string> originalRule; for (const auto & strPair : command->parsed_) { std::string str = strPair.first; if (strPair.second == EvalString::SPECIAL) str = '$' + str; originalRule.push_back(str); } //   : std::vector<std::string> preprocessRule, compileRule; if (rewriter.SplitInvocation(originalRule, preprocessRule, compileRule)) { //  2  rule - rulePP  ruleCC,   bindings_   . //     ruleReplacement (ruleReplacement[rule] = ...) } } const auto paths = state->paths_; std::set<Edge*> erasedEdges; //      for (const auto & iter : paths) { Node* node = iter.second; Edge* in_egde = node->in_edge(); if (!in_egde) continue; //       . //      ,  : const Rule * in_rule = &(in_egde->rule()); auto replacementIt = ruleReplacement.find(in_rule); if (replacementIt != ruleReplacement.end()) { RuleReplace replacement = replacementIt->second; const std::string objectPath = node->path(); const std::string sourcePath = in_egde->inputs_[0]->path(); const std::string ppPath = sourcePath + ".pp"; //       . Node *pp_node = state->GetNode(ppPath, node->slash_bits()); //     Edge* edge_pp = state->AddEdge(replacement.pp); Edge* edge_cc = state->AddEdge(replacement.cc); // ...   ... //       edge_pp; //      edge_cc //     pp_node. //  ,   edge_cc,     - // ,  : edge_cc->is_remote_ = true; //   ,   . in_egde->outputs_.clear(); in_egde->inputs_.clear(); in_egde->env_ = nullptr; erasedEdges.insert(in_egde); } } //   . vector<Edge*> newEdges; for (auto * edge : state->edges_) { if (erasedEdges.find(edge) == erasedEdges.end()) newEdges.push_back(edge); } state->edges_ = newEdges; }

Some tedious fragments are cut, the full code can be viewed here .

Prototype revision:

Most likely, the first version of InvocationRewriter does not work, you will need to take into account many things - for example, the fact that the compilation argument "-c" can be set to "-c", well, I am not talking about the fact that it does not necessarily precede the source file.
There may be many additional flags that mark some files, so not all that “not a flag” is a file.
After creating a split graph, if it is successfully assembled in two phases “preprocessing and compilation”, it will be necessary to integrate the remote execution over the network with our network layer. The actual build cycle in Ninja is in build.cc in the Builder :: Build function. It can be added by analogy with
“If (failures_allowed && command_runner _-> CanRunMore ())” and “if (pending_commands)” have their stages for a distributed assembly.

Stage X. What's next?

After successfully creating a prototype, you need to move in small steps to create a product:

Configuration of all modules - both the network subsystem and the InvocationRewriter;
Support for any combination of options under different compilers;
Compression support for file transfer;
Various diagnostics in the form of logs;
Writing a coordinator who will be able to maintain connections to multiple build servers;
Writing a balancer that will take into account the fact that several clients use the servers at once (and not overload them beyond measure);
Write integration with other build systems, not just Ninja.

In general, guys, I stopped somewhere at this stage; made the opensource project Wuild ( source code here ), the Apache license, which implements all these things. It took about 150 hours of free time to write (if anyone decides to repeat my path). I highly recommend using the existing free libraries to the maximum in order to concentrate on business logic and not to debug the network or launch processes.

What Wuild can do:

Distributed build with cross-compilation (Clang) for Win, Mac, Linux;
Integration with Ninja and Make.

Yes, in general, and all; the project is in a state between alpha and beta (stability is, features are not: D). I do not post benchmarks (I do not want to advertise), but, in comparison with one of the similar products, I was more than satisfied with the speed.

The article is more of an educational nature, and the project is a cautionary one (how not to do it, in the sense of the NIH syndrome, make fewer bicycles).

Who wants - fork, do pull-requests, use for any scary purposes!

Source: https://habr.com/ru/post/321660/

All Articles