📜 ⬆️ ⬇️

Error in HTTP protocol

In the article I want to tell not so much about the error in RFC 2616, as about my approach to creating a parser of HTTP messages, to show its advantages and disadvantages . The basis of my approach is based on two principles: “it is better to lose an hour, then fly five minutes later” and “let the computer work, and I will rest . ”

And so the task as a whole: to implement the HTTP server, and the HTTP parser, in particular. Protocol version 1.1 is described in RFC 2616. In this specification, two entities can be distinguished: the descriptive part and the BNF rules that define the syntax of messages. BNF rules are a well-formalized thing, for which even RFC 5234 exists, where the BNF grammar is described using the same BNF rules. True, RFC 5234 was released later than RFC 2616 (HTTP), and has several minor differences.

BNF, brief excursion

BNF grammar is quite simple, so I will give an example with explanations, I think this will be enough to make a presentation (for those who are not familiar).
start-line = Request-Line | Status-Line generic-message = start-line *(message-header CRLF) CRLF [ message-body ] 

If you translate into Russian, you get:
1) start-line is Request-Line or Status-Line (these are also rules that are described somewhere)
2) A generic-message is a sequence of start-line, * (message-header CRLF), CRLF, and possibly a message-body ([...] brackets indicate non-binding). Where * (message-header CRLF) allows 0 or more repetitions of the concatenation of two message-header and CRLF rules.
Something like regular expressions, which is not surprising.

About the error

In order to trace the error, below I gave a number of rules. Skimming over them, you can see the following: The request consists of a Request-Line and a repeat of the headers separated by CRLFs. The number of header groups includes the entity-header, which, unlike the general-header and the request-header, contains extension-header. The extension-header rule allows non-standard headers, in other words, it is this policy that adds the header to the request.
 My-Header: I am server 
however, the request will remain valid. In addition, this rule opens the possibility to write protocol extensions. Since the extension-header allows any headers, including standard headers (From, Accept, Host, Referer, etc.), the situation arises: if the message contains an invalid standard header, it will not be allowed by the rule describing it, but the extension -header header will allow what is not correct.
 Request = Request-Line ; Section 5.1 *(( general-header ; Section 4.5 | request-header ; Section 5.3 | entity-header ) CRLF) ; Section 7.1 CRLF [ message-body ] ; Section 4.3 entity-header = Allow ; Section 14.7 | Content-Encoding ; Section 14.11 | Content-Language ; Section 14.12 | Content-Length ; Section 14.13 | Content-Location ; Section 14.14 | Content-MD5 ; Section 14.15 | Content-Range ; Section 14.16 | Content-Type ; Section 14.17 | Expires ; Section 14.21 | Last-Modified ; Section 14.29 | extension-header extension-header = message-header message-header = field-name ":" [ field-value ] field-name = token field-value = *( field-content | LWS ) field-content = <the OCTETs making up the field-value and consisting of either *TEXT or combinations of token, separators, and quoted-string> 

Unfortunately, in BNF there is no possibility to describe a rule of the form “something not including something else”. In the specification such rules are described informally:
 ctext = <any TEXT excluding "(" and ")"> qdtext = <any TEXT except <">> 

The correct field-name rule should look something like this:
 field-name = <any token excluding "Accept", "Allow", ... all header names from rfc 2616, 2617 ...> 

')
The main thing

I wanted to tell about utilities which build DKA on the basis of BNF. That is, ideally, the parser is generated automatically. But somehow it turned out that the title to attract attention took too much time, so about the tools next time.

UPD : Gentlemen minus, please express your opinion. If I'm wrong, then I would like to know what.

Source: https://habr.com/ru/post/153037/


All Articles