Writing your template engine in Python

Surely, many of you thought about how template engines are arranged, what is its internal structure and how is it converted into fragments of HTML-code, but did not guess about some features of its implementation. Therefore, let's implement a simplified version of the template engine and demonstrate how it works “under the hood”.

First, you need to decide on the syntax with which our engine will work. In other words, language constructs that the template engine will understand and process them accordingly.

Language syntax

Although in our case the language will be greatly simplified, but for example, we will work with variables and blocks that will look something like this:

<!—-     `{{`  `}}` --> <div>{{my_var}}</div> <!—-    `{%`  `%}` --> {% each items %} <div>{{it}}</div> {% end %}

')
Most blocks must be closed as in the example above and end with the end tag.
Our templating engine will also support working with cycles and conditions. Do not forget to add support for calls inside the blocks - this will be quite a handy thing that may come in handy.

Cycles

With their help, we will be able to bypass collections and get an element with which we will perform the necessary operations:

 {% each people %} <div>{{it.name}}</div> {% end %} {% each [1, 2, 3] %} <div>{{it}}</div> {% end %} {% each records %} <div>{{..name}}</div> {% end %}

In this example, people is a collection and it refers to an element from it. Point, as a separator, allows you to refer to the fields of the object to extract the necessary information. Using ".." will provide access to names located in the context of the parent.

Conditions

No need to present. Our engine will support if ... else constructions , as well as operators: ==, <=,> =, =, is,>, <! .

 {% if num > 5 %} <div> 5</div> {% else %} <div>   5</div> {% end %}

Function calls

Calls must be specified inside the template. Let's not forget, of course, support for named and positional parameters. Blocks calling functions should not be closed.

 <!—-   ... --> <div class='date'>{% call prettify date_created %}</div> <!-- ...   --> <div>{% call log 'here' verbosity='debug' %}</div>

Theoretical part

Before delving into the details of the engine, which will be engaged in rendering templates, it is necessary to have an idea of how to present templates in memory.

In our case, abstract syntax trees will be used (hereinafter referred to as the ASD), so necessary for the presentation of data. ASD is the result of the lexical analysis of the source code. This structure has many advantages compared to the source code, one of which is the elimination of unnecessary text elements (for example, separators).

We will parse the data and analyze the template, building the corresponding tree, which will represent a compiled template. Rendering a template will be a simple tree walk, which will return tree elements formed into fragments of HTML code.

Syntax definition

The first step in our difficult business will be the separation of content into fragments. Each fragment is an HTML tag. For division of content, regular expressions will be used, as well as the split () function.

 VAR_TOKEN_START = '{{' VAR_TOKEN_END = '}}' BLOCK_TOKEN_START = '{%' BLOCK_TOKEN_END = '%}' TOK_REGEX = re.compile(r"(%s.*?%s|%s.*?%s)" % ( VAR_TOKEN_START, VAR_TOKEN_END, BLOCK_TOKEN_START, BLOCK_TOKEN_END ))

So let's analyze the TOK_REGEX. In this regular expression, we have the choice between a variable or a block. This has a definite meaning - we want to divide the content into variables or blocks. The wrapper in the form of tags that have been specified in advance will help us identify the fragments that need to be processed. The? Sign specified in a regular expression is not a greedy repetition. This is necessary so that the regular expression is “lazy” and stops at the first match, for example, when it is necessary to extract the variables specified inside the block. By the way, here you can read about how to control the greed of regular expressions.

Here is a simple example demonstrating the work of this regular season:

 >>> TOK_REGEX.split('{% each vars %}<i>{{it}}</i>{% endeach %}') ['{% each vars %}', '<i>', '{{it}}', '</i>', '{% endeach %}']

In addition, each processed fragment will have its own type, which will be taken into account when processing and compiling. The fragments will be divided into four types:

 VAR_FRAGMENT = 0 OPEN_BLOCK_FRAGMENT = 1 CLOSE_BLOCK_FRAGMENT = 2 TEXT_FRAGMENT = 3

Formation of ASD

After analyzing the regular expression of the source text of the HTML page containing fragments belonging to our template engine, it is necessary to build a tree based on elements that belong to our “language”. We will have the class Node, which is the root of the tree and contains the child nodes, which are subclasses for each type of node. Subclasses should contain the process_fragment () and render () methods:
- process_fragment () is used to further analyze the content and necessary attributes of the Node object.
- render () is needed to convert the corresponding fragment into HTML code

Optionally implement the enter_scope () and exit_scope () methods, which are called during the compiler operation. The first function, enter_scope () , is called when the node creates a new area (more on this later), and exit_scope () to leave the current area being processed when the area has finished processing.

Node Base Class:

 class _Node(object): def __init__(self, fragment=None): self.children = [] self.creates_scope = False self.process_fragment(fragment) def process_fragment(self, fragment): pass def enter_scope(self): pass def render(self, context): pass def exit_scope(self): pass def render_children(self, context, children=None): if children is None: children = self.children def render_child(child): child_html = child.render(context) return '' if not child_html else str(child_html) return ''.join(map(render_child, children))

Here is an example of the Variable subclass:

 class _Variable(_Node): def process_fragment(self, fragment): self.name = fragment def render(self, context): return resolve_in_context(self.name, context)

When determining the node, a fragment of the text will be analyzed, which will be shown to us by the type of this fragment (i.e. it is a variable, a bracket, etc.)
Text and variables will be converted to their corresponding subclasses.
If these are cycles, then their processing will take a little longer, because it means a whole series of commands that need to be executed. It is quite simple to find out that this is a block of commands: it is only necessary to analyze a fragment of the text contained in “{%” and “%}”. Here is a simple example:

 {% each items %}

Where each is the intended block of commands.

The important point is that each node creates an area. During the compilation process, we monitor the current region and the nodes that are added within this region. If during the analysis the closing bracket is encountered, the formation of the current region is completed, and the transition to the next one takes place.

 def compile(self): root = _Root() scope_stack = [root] for fragment in self.each_fragment(): if not scope_stack: raise TemplateError('nesting issues') parent_scope = scope_stack[-1] if fragment.type == CLOSE_BLOCK_FRAGMENT: parent_scope.exit_scope() scope_stack.pop() continue new_node = self.create_node(fragment) if new_node: parent_scope.children.append(new_node) if new_node.creates_scope: scope_stack.append(new_node) new_node.enter_scope() return root

Rendering

The final step is the conversion of the SDA to HTML. To do this, we visit all the nodes of the tree and call the render () method. In the rendering process, it is necessary to take into account what is being done at the moment: with literals or with the context of the variable name. To do this, use ast.literal_eval () , which safely allows you to analyze the string:

 def eval_expression(expr): try: return 'literal', ast.literal_eval(expr) except ValueError, SyntaxError: return 'name', expr

If we deal with the context of a variable name, then we analyze what is indicated with it: “.” Or “..”:

 def resolve(name, context): if name.startswith('..'): context = context.get('..', {}) name = name[2:] try: for tok in name.split('.'): context = context[tok] return context except KeyError: raise TemplateContextError(name)

Conclusion

This article is a translation that allows you to give a general idea of how template engines work. Although this is the simplest example of implementation, it can be used as a basis for building more complex template engines.

Full source code, as well as examples of use can be found here.

Source: https://habr.com/ru/post/180935/

All Articles