📜 ⬆️ ⬇️

How to make a distributed assembly system from Ninja?

Hi, Habr!

Recently, I thought about picking up another free build system, “Is it possible to take and write such a system yourself? It's simple - take the same Ninja, fasten the separation to preprocessing and compiling, and transfer files over the network to and fro. How much easier? "

Simple - not easy, how to make such a system yourself - I will tell you under the cut.

Stage 0. Task statement


Disclaimer: The article is marked as a tutorial, but this is not a step-by-step tutorial, copying the code from which the finished product will turn out. It is rather an instruction - how to plan and where to dig.
')
First, we define what the general algorithm of the work should be:


It seems not so scary, right? But just for the evening to write all this, perhaps, will not work. At first we will write several prototypes, and the article tells about them:

  1. Prototype 1. The program mimics the compiler, dividing the command into 2, and self-invoking the compiler.
  2. Prototype 2. To this, add the transfer command to compile over the network, without the file itself.
  3. Prototype 3. Go through the Ninja assembly graph, displaying potentially broken commands.

It is recommended to develop a prototype for a POSIX-compatible OS, if you do not use libraries.

Stage 1. We split the command line


For the prototype, let's stop on the GCC compiler (or Clang, there is not much difference), since its command line is easier to disassemble.

Let our program be called through the command "test -c hello.cpp -o hello.o". We assume that after the key "-c" (compilation into object code) there is always the name of the input file, although this is not the case. Also, for the time being, we’ll stop only at work in a local directory.

We will use the popen function to start the process and get standard output. The function allows you to open the process in the same way as we would open the file.

Main.cpp file:

#include <iostream> #include "InvocationRewriter.hpp" #include "LocalExecutor.hpp" int main(int argc, char ** argv) { StringVector args; for (int i = 1; i < argc; ++i) args.emplace_back(argv[i]); InvocationRewriter rewriter; StringVector ppArgs, ccArgs; //      . if (!rewriter.SplitInvocation(args, ppArgs, ccArgs)) { std::cerr << "Usage: -c <filename> -o <filename> \n"; return 1; } LocalExecutor localExecutor; const std::string cxxExecutable = "/usr/bin/g++"; // ,     GNU/Linux. const auto ppResult = localExecutor.Execute(cxxExecutable, ppArgs); if (!ppResult.m_result) { std::cerr << ppResult.m_output; return 1; } const auto ccResult = localExecutor.Execute(cxxExecutable, ccArgs); if (!ccResult.m_result) { std::cerr << ccResult.m_output; return 1; } //   ,    ,   . return 0; } 

InvocationRewriter.hpp code
 #pragma once #include <string> #include <vector> #include <algorithm> using StringVector = std::vector<std::string>; class InvocationRewriter { public: bool SplitInvocation(const StringVector & original, StringVector & preprocessor, StringVector & compilation) { //     -c  -o. //  ,   -c     ,     . const auto cIter = std::find(original.cbegin(), original.cend(), "-c"); const auto oIter = std::find(original.cbegin(), original.cend(), "-o"); if (cIter == original.cend() || oIter == original.cend()) return false; const auto cIndex = cIter - original.cbegin(); const auto oIndex = oIter - original.cbegin(); preprocessor = compilation = original; const std::string & inputFilename = original[cIndex + 1]; preprocessor[oIndex + 1] = "pp_" + inputFilename; //     preprocessor[cIndex] = "-E"; //   - . compilation[cIndex + 1] = "pp_" + inputFilename; return true; } }; 


LocalExecutor.hpp code
 #pragma once #include <string> #include <vector> #include <algorithm> #include <stdio.h> using StringVector = std::vector<std::string>; class LocalExecutor { public: ///   :   +  struct ExecutorResult { std::string m_output; bool m_result = false; ExecutorResult(const std::string & output = "", bool result = false) : m_output(output), m_result(result) {} }; ///     popen. ExecutorResult Execute(const std::string & executable, const StringVector & args) { std::string cmd = executable; for (const auto & arg : args) cmd += " " + arg; cmd += " 2>&1"; //  sterr  stdout. FILE * process = popen(cmd.c_str(), "r"); if (!process) return ExecutorResult("Failed to execute:" + cmd); ExecutorResult result; char buffer[1024]; while (fgets(buffer, sizeof(buffer)-1, process) != nullptr) result.m_output += std::string(buffer); result.m_result = pclose(process) == 0; return result; } }; 


Well, now we have a small compiler emulator that jerks a real compiler. We go further :)

Further development of the prototype:


Stage 2. Network subsystem


Prototype network exchange will do on BSD Sockets ( Berkeley Sockets )

A bit of theory:

A socket is literally a “hole” in which data can be written and read from it. To connect to a remote server, the algorithm is as follows:


The server works a little harder:


We need a socket client and a socket server. Let their interface look like this:

 ///   class IDataSocket { public: using Ptr = std::shared_ptr<IDataSocket>; ///    . Success- , TryAgain -      , Fail -   . enum class WriteState { Success, TryAgain, Fail }; enum class ReadState { Success, TryAgain, Fail }; public: virtual ~IDataSocket() = default; ///     virtual bool Connect () = 0; ///   virtual void Disconnect () = 0; ///    - ;   virtual bool IsConnected () const = 0; virtual bool IsPending() const = 0; ///       virtual ReadState Read(ByteArrayHolder & buffer) = 0; ///    . virtual WriteState Write(const ByteArrayHolder & buffer, size_t maxBytes = size_t(-1)) = 0; }; ///  "".      . class IDataListener { public: using Ptr = std::shared_ptr<IDataListener>; virtual ~IDataListener() = default; ///    virtual IDataSocket::Ptr GetPendingConnection() = 0; ///   : virtual bool StartListen() = 0; }; 

I will not embed the implementation of this interface into the article, you can do it yourself or peep here .

Suppose we have a socket ready, what will the client and server compiler look like?

Server:

 #include <TcpListener.h> #include <algorithm> #include <iostream> #include "LocalExecutor.hpp" int main() { //    . TcpConnectionParams tcpParams; tcpParams.SetPoint(6666, "localhost"); //     6666; auto listener = TcpListener::Create(tcpParams); IDataSocket::Ptr connection; //    ; while((connection = listener->GetPendingConnection()) == nullptr) ; //      . connection->Connect(); ByteArrayHolder incomingBuffer; //!<    std::vector<uint8_t>; while (connection->Read(incomingBuffer) == IDataSocket::ReadState::TryAgain) ; // ,       ,  . std::string args((const char*)(incomingBuffer.data()), incomingBuffer.size()); std::replace(args.begin(), args.end(), '\n', ' '); LocalExecutor localExecutor; const auto result = localExecutor.Execute("/usr/bin/g++", StringVector(1, args)); std::string stdOutput = result.m_output; if (stdOutput.empty()) stdOutput = "OK\n"; //   -    ,    OK. //       . ByteArrayHolder outgoingBuffer; std::copy(stdOutput.cbegin(), stdOutput.cend(), std::back_inserter(outgoingBuffer.ref())); connection->Write(outgoingBuffer); connection->Disconnect(); //    ,       . //     /      . return 0; } 

Customer:

 #include <iostream> #include <TcpSocket.h> #include "InvocationRewriter.hpp" #include "LocalExecutor.hpp" int main(int argc, char ** argv) { StringVector args; for (int i = 1; i < argc; ++i) args.emplace_back(argv[i]); InvocationRewriter rewriter; StringVector ppArgs, ccArgs; //      . if (!rewriter.SplitInvocation(args, ppArgs, ccArgs)) { std::cerr << "Usage: -c <filename> -o <filename> \n"; return 1; } LocalExecutor localExecutor; const std::string cxxExecutable = "/usr/bin/g++"; // ,     GNU/Linux. const auto ppResult = localExecutor.Execute(cxxExecutable, ppArgs); if (!ppResult.m_result) { std::cerr << ppResult.m_output; return 1; } //      6666 TcpConnectionParams tcpParams; tcpParams.SetPoint(6666, "localhost"); auto connection = TcpSocket::Create(tcpParams); connection->Connect(); ByteArrayHolder outgoingBuffer; for (auto arg : ccArgs) { arg += " "; //       . std::copy(arg.cbegin(), arg.cend(), std::back_inserter(outgoingBuffer.ref())); } connection->Write(outgoingBuffer); ByteArrayHolder incomingBuffer; while (connection->Read(incomingBuffer) == IDataSocket::ReadState::TryAgain) ; std::string response((const char*)(incomingBuffer.data()), incomingBuffer.size()); if (response != "OK\n") { std::cerr << response; return 1; } return 0; } 

Yes, not all sources are shown, for example, TcpConnectionParams or ByteArrayHolder, but these are fairly primitive structures.

After debugging this prototype, we have a small service that can compile preprocessed files locally (with some assumptions, for example, that the working directory of the client and server are the same).

Further development of the prototype:


Stage 3. Integration with Ninja


To get started, you need to familiarize yourself with the principles of Ninja. It is assumed that you have already collected any projects with its help and roughly imagine what build.ninja looks like.
Used concepts:


How it looks about, if you draw dependencies:



This shows the assembly graph for two translation units that are assembled into an application.

As we can see, in order to make our changes to the build system, we need to rewrite the State, breaking Edges into two in the right places and adding new nodes (preprocessed files).
Suppose we already have ninja sources, we compile them, and everything works in assembled form.
Add the following code snippet to ninja.cc:

  // Limit number of rebuilds, to prevent infinite loops. const int kCycleLimit = 100; for (int cycle = 1; cycle <= kCycleLimit; ++cycle) { NinjaMain ninja(ninja_command, config); ManifestParser parser(&ninja.state_, &ninja.disk_interface_, options.dupe_edges_should_err ? kDupeEdgeActionError : kDupeEdgeActionWarn); string err; if (!parser.Load(options.input_file, &err)) { Error("%s", err.c_str()); return 1; } //    ,   : RewriteStateRules(&ninja.state_); //    

The function RewriteStateRules itself can be carried in a separate file, or declared here in ninja.cc as:

 #include "InvocationRewriter.hpp" // ,     Ninja. struct RuleReplace { const Rule* pp; const Rule* cc; std::string toolId; RuleReplace() = default; RuleReplace(const Rule* pp_, const Rule* cc_, std::string id) : pp(pp_), cc(cc_), toolId(id) {} }; void RewriteStateRules(State *state) { //   , ..    ,      . const auto rules = state->bindings_.GetRules(); std::map<const Rule*, RuleReplace> ruleReplacement; InvocationRewriter rewriter; //      for (const auto & ruleIt : rules) { const Rule * rule = ruleIt.second; const EvalString* command = rule->GetBinding("command"); if (!command) continue; //     rewriter-. std::vector<std::string> originalRule; for (const auto & strPair : command->parsed_) { std::string str = strPair.first; if (strPair.second == EvalString::SPECIAL) str = '$' + str; originalRule.push_back(str); } //   : std::vector<std::string> preprocessRule, compileRule; if (rewriter.SplitInvocation(originalRule, preprocessRule, compileRule)) { //  2  rule - rulePP  ruleCC,   bindings_   . //     ruleReplacement (ruleReplacement[rule] = ...) } } const auto paths = state->paths_; std::set<Edge*> erasedEdges; //      for (const auto & iter : paths) { Node* node = iter.second; Edge* in_egde = node->in_edge(); if (!in_egde) continue; //       . //      ,  : const Rule * in_rule = &(in_egde->rule()); auto replacementIt = ruleReplacement.find(in_rule); if (replacementIt != ruleReplacement.end()) { RuleReplace replacement = replacementIt->second; const std::string objectPath = node->path(); const std::string sourcePath = in_egde->inputs_[0]->path(); const std::string ppPath = sourcePath + ".pp"; //       . Node *pp_node = state->GetNode(ppPath, node->slash_bits()); //     Edge* edge_pp = state->AddEdge(replacement.pp); Edge* edge_cc = state->AddEdge(replacement.cc); // ...   ... //       edge_pp; //      edge_cc //     pp_node. //  ,   edge_cc,     - // ,  : edge_cc->is_remote_ = true; //   ,   . in_egde->outputs_.clear(); in_egde->inputs_.clear(); in_egde->env_ = nullptr; erasedEdges.insert(in_egde); } } //   . vector<Edge*> newEdges; for (auto * edge : state->edges_) { if (erasedEdges.find(edge) == erasedEdges.end()) newEdges.push_back(edge); } state->edges_ = newEdges; } 

Some tedious fragments are cut, the full code can be viewed here .

Prototype revision:


Stage X. What's next?


After successfully creating a prototype, you need to move in small steps to create a product:


In general, guys, I stopped somewhere at this stage; made the opensource project Wuild ( source code here ), the Apache license, which implements all these things. It took about 150 hours of free time to write (if anyone decides to repeat my path). I highly recommend using the existing free libraries to the maximum in order to concentrate on business logic and not to debug the network or launch processes.

What Wuild can do:


Yes, in general, and all; the project is in a state between alpha and beta (stability is, features are not: D). I do not post benchmarks (I do not want to advertise), but, in comparison with one of the similar products, I was more than satisfied with the speed.

The article is more of an educational nature, and the project is a cautionary one (how not to do it, in the sense of the NIH syndrome, make fewer bicycles).

Who wants - fork, do pull-requests, use for any scary purposes!

Source: https://habr.com/ru/post/321660/


All Articles