📜 ⬆️ ⬇️

The inner world of Razor. Part 1 - Recursive Ping Pong

This is the first article about a new ASP.NET parser - Razor. Over which we worked for a long time, and I would like to tell the readers how it works.

Razor parser is very different from the existing ASPX parser. In fact, an ASPX parser is almost entirely built on regular expressions, because the syntax is fairly simple to parse. Razor parser is divided into three components:
  1. A markup parser that has a basic understanding of HTML syntax.
  2. Parser code that has a basic representation of C # or VB.
  3. And the main “conductor” who knows how to put two parsers together.

When I say “basic presentation” I mean the basics, we are not talking about a completely independent C # and HTML parser. In our team, we joke by calling them “Markup Identifier” and “Code Thinking” :)

Total on the stage Razor plays three “actors”: the Core Parser, the Markup Parser and the Code Parser. All three work together to parse the Razor document. Now let's take the Razor file and conduct a full review of the parsing procedure using the data from the actors. We will use the following example:


')
So let's start at the top. In fact, the Razor parser is in one of the states at any moment of parsing: parsing the document markup, parsing the block markup, or parsing the code block. The first two are processed by the markup parser, and the last by the code parser. When the kernel parser is launched for the first time, it calls the markup parser and asks it to parse the document markup and return the result. Now the parser is in the state of parsing document markup. In this state, it simply searches for the “@” symbol, it doesn’t matter what tags it comes across and all that concerns HTML, the main goal is “@”. When he found @, he decides - is this a switch to the code or email address? This solution is based on the symbols before and after @, checking the validity of the email address. This is just a standard procedure, there is a sequence of checks to switch to code mode.

In this case, when we see the first “@” character, it is preceded by a space, which is not valid for an email address. So we know for sure that we need to switch to code mode. The markup parser calls inside the code parser and asks to disassemble the code block. The block, in the definition of the Razor parser, is basically a single piece of code or markup with a clear start and end. So “foreach” in our case is an example of a block of code. It starts with the character “f” and ends with “}”. The code parser knows enough about C # to understand this, so it starts parsing the code. The code parser does some simple tracing of C # operators, so when it gets to “<li>”, it understands that the tag is at the beginning of the C # expression. “<Li>” cannot be placed at the beginning of a C # expression, so the code parser knows that the markup block starts from this point. Therefore, it returns to the call to the parser markup, in order to parse the block of HTML. This creates a kind of recursive ping-pong between code and markup parsers. We started with the markup, then we called the code inside, then the markup again, and so on, until we got the result of the entire call chain:



(Understandably, I have excluded from the list many auxiliary methods :).

This sheds light on the fundamental difference between ASPX and Razor. In aspx files, you can think of code and markup as two parallel threads. You write markup, then jump over and write code, then come back and write markup, etc. Razor same files as a tree. You write the markup, then put the code into it, then put the markup in the code, etc.

So, we just called the markup parser to parse the markup block, the block starts with “<li>” and ends with “</ li>”. Until we find “</ li>”, we will not decide that the markup unit is over. So, if you have “}” somewhere inside “<li>” it will not complete “foreach”, since we have not advanced far enough up the stack.

During parsing “<li>”, the markup parser sees a lot of “@” characters, from which there are many calls to the code parser. Thus the call stack increases:



I will go into the details of processing the blocks later, because the process is a bit complicated, as a result we ended up with these block codes and returned to the “<li>” block. Next, we see “</ li>”, so we complete this block and return to the “foreach” block. “}” Closes the block, so now we're back at the top of the stack — the markup document. After that, we read until we reach the end of the file, without finding more “@” characters. And voila! We parsed this file! "

I hope the overall structure of the parsing algorithm is clear. The main thing is to stop thinking that the code parser and markup work in separate threads, and instead the constructions are located one inside the other. Hint, inspirational, we drew from PowerShell;).

Source: https://habr.com/ru/post/98958/


All Articles