Jinja2 in the world of C ++, part two. Rendering

Jinja2 logo This is the second part of the story about porting the Jinja2 template engine to C ++. The first can be read here: templates of the third order, or as I ported Jinja2 to C ++ . It focuses on the template rendering process. Or, in other words, about writing the interpreter of a python-like language from scratch.

Rendering as such

After parsing, the template turns into a tree containing nodes of three types: simple text , calculated expressions, and control structures . Accordingly, in the rendering process, plain text should be placed in the output stream without any changes, expressions should be calculated, converted into text, which will be placed in the stream, and control structures must be executed. At first glance, there was nothing difficult in realizing the rendering process: you just need to go around all the nodes of the tree, calculate everything, execute everything, and generate text. It's simple. Exactly as long as two conditions are met: a) all work is done with strings of only one type (string or wstring); b) only very simple and basic expressions are used. Actually, it is with such limitations that inja and Jinja2CppLight are implemented. In the case of my Jinja2Cpp, both conditions do not work. First, I initially laid out transparent support for both types of strings. Secondly, all development was started just for the sake of supporting the Jinja2 specification almost in full, and this is, in essence, a full-fledged scripting language. Therefore, I had to dig deeper with rendering than with parsing.

Expression evaluation

A template would not be a template if it could not be parameterized. In principle, Jinja2 allows the option of “in itself” templates - all the necessary variables can be set inside the template itself, and then rendered. But the work in the template with the parameters obtained "outside" remains the main case. Thus, the result of the calculation of the expression depends on which variables (parameters) with which values are visible at the point of calculation. And the snag is that in Jinja2 there are not just scopes (which can be nested), but also with complicated "transparency" rules. For example, here is a template:

{% set param1=10 %} {{ param1 }}

As a result of its rendering, the text will be received 10
The option is a bit more complicated:

 {% set param1=10 %} {{ param1 }} {% for param1 in range(10) %}-{{ param1 }}-{% endfor %} {{ param1 }}

Rendered already in 10-0--1--2--3--4--5--6--7--8--9-10
The cycle generates a new scop in which you can define your own variable parameters, and these parameters will not be visible outside the scope, just as they will not overwrite the values of the same parameters in the external one. More cunning with extends / block constructions, but this is better read in the Jinja2 documentation.

Thus, the computing context appears. Or rather, rendering in general:

 class RenderContext { public: RenderContext(const InternalValueMap& extValues, IRendererCallback* rendererCallback); InternalValueMap& EnterScope(); void ExitScope(); auto FindValue(const std::string& val, bool& found) const { for (auto p = m_scopes.rbegin(); p != m_scopes.rend(); ++ p) { auto valP = p->find(val); if (valP != p->end()) { found = true; return valP; } } auto valP = m_externalScope->find(val); if (valP != m_externalScope->end()) { found = true; return valP; } found = false; return m_externalScope->end(); } auto& GetCurrentScope() const; auto& GetCurrentScope(); auto& GetGlobalScope(); auto GetRendererCallback(); RenderContext Clone(bool includeCurrentContext) const; private: InternalValueMap* m_currentScope; const InternalValueMap* m_externalScope; std::list<InternalValueMap> m_scopes; IRendererCallback* m_rendererCallback; };

From here .

The context contains a pointer to a collection of values obtained when calling a rendering function, a list (stack) of scopes, the current active scopes and a pointer to a callback interface, with various rendering functions useful for rendering. But about him a little later. The search function of the parameter is consistently raised through the list of contexts up to the external, until it finds the desired parameter.

Now a little about the parameters themselves. From the point of view of the external interface (and its users), Jinja2 supports the following list of valid types:

Numbers (int, double)
Strings (narrow, wide)
bool
Arrays (more like dimensionless tuples)
Dictionaries
Reflected C ++ Structures

All this is described by a special data type created on the basis of boost :: variant:

 using ValueData = boost::variant<EmptyValue, bool, std::string, std::wstring, int64_t, double, boost::recursive_wrapper<ValuesList>, boost::recursive_wrapper<ValuesMap>, GenericList, GenericMap>; class Value { public: Value() = default; template<typename T> Value(T&& val, typename std::enable_if<!std::is_same<std::decay_t<T>, Value>::value>::type* = nullptr) : m_data(std::forward<T>(val)) { } Value(const char* val) : m_data(std::string(val)) { } template<size_t N> Value(char (&val)[N]) : m_data(std::string(val)) { } Value(int val) : m_data(static_cast<int64_t>(val)) { } const ValueData& data() const {return m_data;} ValueData& data() {return m_data;} private: ValueData m_data; };

From here .

Of course, the elements of arrays and dictionaries can be of any of the listed types. But the problem is that for internal use this set of types is too narrow. To simplify implementation, support for the following additional types was needed:

String in the target format. It can be narrow or wide depending on what type of template is being rendered.
callable type
AST tree node
Key-value pair

Through such an extension, it became possible to transfer service data through the rendering context, which otherwise would have to be “shined” in public headers, as well as more successfully generalizing some algorithms that work with arrays and dictionaries.

Boost :: variant was not chosen by chance. Its rich features are used to work with parameters of specific types. Jinja2CppLight uses polymorphic classes for the same purpose, and inja uses the nlohmann json library type system. Both of these alternatives, alas, did not suit me. The reason: the possibility of n-ary dispatch for boost :: variant (and now std :: variant). For a variant type, you can make a static visitor that takes two specific stored types, and set it on a couple of values. And everything will work as it should! In the case of polymorphic classes or simple union'ami such convenience will not work:

 struct StringJoiner : BaseVisitor<> { using BaseVisitor::operator (); InternalValue operator() (EmptyValue, const std::string& str) const { return str; } InternalValue operator() (const std::string& left, const std::string& right) const { return left + right; } };

From here .

Such a visitor is called very simply:

 InternalValue delimiter = m_args["d"]->Evaluate(context); for (const InternalValue& val : values) { if (isFirst) isFirst = false; else result = Apply2<visitors::StringJoiner>(result, delimiter); result = Apply2<visitors::StringJoiner>(result, val); }

Apply2 here is a wrapper over boost::apply_visitor , which applies a visitor of a given type parameter to a pair of variant values, doing some transformations first, if necessary. If the designer of the visitor needs parameters - they are passed after the objects to which the visitor is applied:

 comparator = [](const KeyValuePair& left, const KeyValuePair& right) { return ConvertToBool(Apply2<visitors::BinaryMathOperation>(left.value, right.value, BinaryExpression::LogicalLt, BinaryExpression::CaseSensitive)); };

Thus, the logic of performing operations with parameters is as follows: variant (s) -> unpacking with the help of visitor -> performing the desired action on specific values of specific types -> packing the result back into the variant. And a minimum of hidden magic. It would be possible to implement everything as in js: perform operations (for example, additions) in any case, choosing a certain system of converting strings to numbers, numbers to strings, strings to lists, etc. And to get strange and unexpected results. I chose a simpler and more predictable way: if an operation on a value (or a pair of values) is impossible or illogical, then an empty result is returned. Therefore, if you add a number with a string, you can only get a string as a result if the concatenation operation ('~') is used. Otherwise, the result will be an empty value. The priority of operations is determined by the grammar, so no additional checks during the processing of AST are needed.

Filters and tests

What is called the "standard library" in other languages in Jinja2 is called "filters". In essence, a filter is a kind of complex operation on a value to the left of the '|' sign, the result of which will be a new value. Filters can be chained by organizing pipeline:
{{ menuItems | selectattr('visible') | map(attribute='title') | map('upper') | join(' -> ') }}
Here, only those elements from which the visible attribute is set to true will be selected from the menuItems array, then the title attribute will be taken from these elements, converted to upper case, and the resulting list of strings will be glued with a separator '->' into one line. Or, say, in the example of "life":

 {% macro MethodsDecl(class, access) %} {% for method in class.methods | rejectattr('isImplicit') | selectattr('accessType', 'in', access) %} {{ method.fullPrototype }}; {% endfor %} {% endmacro %}

From here .

Alternative option

 {% macro MethodsDecl(class, access) %} {{ for method in class.methods | rejectattr('isImplicit') | selectattr('accessType', 'in', access) | map(attribute='fullPrototype') | join(';\n') }}; {% endmacro %}

This macro enumerates all the methods of a given class, discards those for which the isImplicit attribute is set to true, from the rest it selects those for which the value of the accessType attribute matches one of the ones specified, and displays their prototypes. Relatively visually. And it is much simpler than three-story cycles and if'y to fence. By the way, something similar in C ++ can be done within the range v.3 specification.

Actually, the main error in time was associated with the implementation of about forty filters, which I included in the basic set. From something I took that I could handle it in a week or two. It was too optimistic. And although the typical implementation of the filter is quite simple: take a value and apply a certain functor to it, they turned out to be too many, and I had to tinker.
A separate interesting task in the implementation process was the logic of argument processing. In Jinja2, as in python, the arguments passed to the call can be both named and positional. And the parameters in the declaration of the filter can be both mandatory and optional (with default values). Moreover, unlike C ++, optional parameters can be anywhere in the declaration. It was necessary to invent an algorithm for combining these two lists, taking into account different cases. Here, let's say, is the function range: range([start, ]stop[, step]) . It can be called in the following ways:

 range(10) // -> range(start = 0, stop = 10, step = 1) range(1, 10) // -> range(start = 1, stop = 10, step = 1) range(1, 10, 3) // -> range(start = 1, stop = 10, step = 3) range(step=2, 10) // -> range(start = 0, stop = 10, step = 2) range(2, step=2, 10) // -> range(start = 2, stop = 10, step = 2)

And so on. And I would very much like that in the implementation code of the filter function it was not necessary to take into account all these cases. As a result, I stopped at the fact that in the code of a filter, a tester or a function, the parameters are obtained strictly by name. And a separate function compares the actual argument list with the expected parameter list while checking that all the required parameters are specified in one way or another:

Big piece of code

 ParsedArguments ParseCallParams(const std::initializer_list<ArgumentInfo>& args, const CallParams& params, bool& isSucceeded) { struct ArgInfo { ArgState state = NotFound; int prevNotFound = -1; int nextNotFound = -1; const ArgumentInfo* info = nullptr; }; boost::container::small_vector<ArgInfo, 8> argsInfo(args.size()); boost::container::small_vector<ParamState, 8> posParamsInfo(params.posParams.size()); isSucceeded = true; ParsedArguments result; int argIdx = 0; int firstMandatoryIdx = -1; int prevNotFound = -1; int foundKwArgs = 0; // Find all provided keyword args for (auto& argInfo : args) { argsInfo[argIdx].info = &argInfo; auto p = params.kwParams.find(argInfo.name); if (p != params.kwParams.end()) { result.args[argInfo.name] = p->second; argsInfo[argIdx].state = Keyword; ++ foundKwArgs; } else { if (argInfo.mandatory) { argsInfo[argIdx].state = NotFoundMandatory; if (firstMandatoryIdx == -1) firstMandatoryIdx = argIdx; } else { argsInfo[argIdx].state = NotFound; } if (prevNotFound != -1) argsInfo[prevNotFound].nextNotFound = argIdx; argsInfo[argIdx].prevNotFound = prevNotFound; prevNotFound = argIdx; } ++ argIdx; } int startPosArg = firstMandatoryIdx == -1 ? 0 : firstMandatoryIdx; int curPosArg = startPosArg; int eatenPosArgs = 0; // Determine the range for positional arguments scanning bool isFirstTime = true; for (; eatenPosArgs < posParamsInfo.size(); ++ eatenPosArgs) { if (isFirstTime) { for (; startPosArg < args.size() && (argsInfo[startPosArg].state == Keyword || argsInfo[startPosArg].state == Positional); ++ startPosArg) ; isFirstTime = false; continue; } int prevNotFound = argsInfo[startPosArg].prevNotFound; if (prevNotFound != -1) { startPosArg = prevNotFound; } else if (curPosArg == args.size()) { break; } else { int nextPosArg = argsInfo[curPosArg].nextNotFound; if (nextPosArg == -1) break; curPosArg = nextPosArg; } } // Map positional params to the desired arguments int curArg = startPosArg; for (int idx = 0; idx < eatenPosArgs && curArg != -1; ++ idx, curArg = argsInfo[curArg].nextNotFound) { result.args[argsInfo[curArg].info->name] = params.posParams[idx]; argsInfo[curArg].state = Positional; } // Fill default arguments (if missing) and check for mandatory for (int idx = 0; idx < argsInfo.size(); ++ idx) { auto& argInfo = argsInfo[idx]; switch (argInfo.state) { case Positional: case Keyword: continue; case NotFound: { if (!IsEmpty(argInfo.info->defaultVal)) result.args[argInfo.info->name] = std::make_shared<ConstantExpression>(argInfo.info->defaultVal); break; } case NotFoundMandatory: isSucceeded = false; break; } } // Fill the extra positional and kw-args for (auto& kw : params.kwParams) { if (result.args.find(kw.first) != result.args.end()) continue; result.extraKwArgs[kw.first] = kw.second; } for (auto idx = eatenPosArgs; idx < params.posParams.size(); ++ idx) result.extraPosArgs.push_back(params.posParams[idx]); return result; }

From here .

It is called in this way (for, say, range ):

 bool isArgsParsed = true; auto args = helpers::ParseCallParams({{"start"}, {"stop", true}, {"step"}}, m_params, isArgsParsed); if (!isArgsParsed) return InternalValue();

and returns the following structure:

 struct ParsedArguments { std::unordered_map<std::string, ExpressionEvaluatorPtr<>> args; std::unordered_map<std::string, ExpressionEvaluatorPtr<>> extraKwArgs; std::vector<ExpressionEvaluatorPtr<>> extraPosArgs; ExpressionEvaluatorPtr<> operator[](std::string name) const { auto p = args.find(name); if (p == args.end()) return ExpressionEvaluatorPtr<>(); return p->second; } };

the necessary argument from which is taken simply by its name:

 auto startExpr = args["start"]; auto stopExpr = args["stop"]; auto stepExpr = args["step"]; InternalValue startVal = startExpr ? startExpr->Evaluate(values) : InternalValue(); InternalValue stopVal = stopExpr ? stopExpr->Evaluate(values) : InternalValue(); InternalValue stepVal = stepExpr ? stepExpr->Evaluate(values) : InternalValue();

A similar mechanism is used when working with macros and testers. And although, it seems, there is nothing difficult in describing the arguments of each filter and test, not (as well as implementing it), but even the “basic” set, which included about fifty of both, turned out to be quite voluminous to implement. And this is on the condition that it did not include all sorts of tricky things, like formatting strings under HTML (or C ++), outputting values in formats like xml or json, and so on.

The next part focuses on the implementation of working with several templates (export, include, macros), as well as exciting adventures with the implementation of error handling and working with strings of different widths.

Traditionally, links:

Jinja2 Specification
Jinja2Cpp implementation

Source: https://habr.com/ru/post/419011/

All Articles

Jinja2 in the world of C ++, part two. Rendering

Rendering as such

Expression evaluation

Filters and tests

More articles: