Do you know the difference between
% {REQUEST_URI} and Apache mod_rewrite from
$ _SERVER ["REQUEST_URI"] in PHP?
Can you in .htaccess at the Apache level do a correct 301 redirection from a domain with the www prefix or to it?
For the last question, I still cannot offer a solution. The reason is in the HTTP / 1.1 protocol, which had to be studied in more detail when “reinventing the wheel” (creating the core for the site).
')
It's all about the HTTP header request "Host:". Under certain conditions, there can be anything, and the server should completely ignore this according to HTTP / 1.1. Most developers use the value of this field, for example, for SEO optimizations. Looking ahead, I will say that an additional proxy (for example, nginx) will solve this problem.
To illustrate the incorrect behavior of the servers, I decided to iterate over the websites
of Habr companies . For a dozen sites I did it manually, and then I discovered that some sites respond to erroneous requests "correctly." After that, a small utility was written for testing, which made it possible to increase the number of test patterns and sites to check.
What hides REQUEST_URI in HTTP / 1.1?
Theory
HTTP / 1.0
I will begin with the HTTP / 1.0 protocol, which is described in RfC1945
www.w3.org/Protocols/rfc1945/rfc1945 and is dated May 1996. To get the desired page, it was enough to connect to the server and send one line:
GET /path/to/resource.html HTTP / 1.0
When accessing the proxy server, it was necessary to use not the absolute path, but the full address:
GET http://domain.name/path/to/resource.html HTTP / 1.0
This is all described in Section 5.1.2 of the Request-URI.
Appearance of the host
So that one server could serve several domain names at once, the protocol creators added the request header “Host:”, which should have contained the domain that is being accessed. Although this header is not part of the HTTP / 1.0 standard, it has become supported by some servers and clients. For example, wget sends requests via the HTTP / 1.0 protocol, but adds “Host:”.
HTTP / 1.1
In June 1999 (fourteen years ago) the HTTP / 1.1 protocol appeared, which is described in RfC2616
www.w3.org/Protocols/rfc2616/rfc2616.html . In
section 14.23, the new protocol required each request header to contain a “Host” field:
A client MUST include a host header field in all HTTP / 1.1 request messages. If the user requested the URI, it would not be a valid value.
In addition, significant changes were made to the Request-URI from the request line (
section 5.1.2 ). As in the previous protocol, the full address is required when requesting proxy servers ("The absoluteURI form is REQUIRED when it is being made to a proxy."). But all servers must respond to such requests, although clients will issue such requests only to proxy servers:
For all HTTP / 1.1 servers MUST accept all URLs, even though HTTP / 1.1 requests for proxies.
I’ll draw your attention to the fact that a transition to full addresses was assumed (absoluteURI, for example,
http : //www.w3.org/pub/WWW/TheProject.html ), so customers are not required to use only absolute paths (abs_path, for example,
/ pub /WWW/TheProject.html ). In addition, the server explicitly requires the ability to respond to customer requests with absoluteURI, so the objection that in this case the client's request is not correct, I exclude immediately, since "the client is always right."
Host to HTTP / 1.1
Changes to the Request-URI may seem harmless, but
Section 5.2 contains one important requirement: “If the Request-URI is an absoluteURI, the request-part of the Request-URI. Any host header field value in the request MUST be ignored. "That is, the interpretation of the request
GET http://domain.name/path/to/resource.html HTTP / 1.1
Host: any_text_tut
must match the request
GET /path/to/resource.html HTTP / 1.1
Host: domain.name
Do you ignore “Host:” when querying with absoluteURI?
% {REQUEST_URI} and $ _SERVER ["REQUEST_URI"]
The
documentation for mod_rewrite says the following:
THE_REQUEST
The full HTTP request line sent to the server (eg, "GET /index.html HTTP / 1.1"). This doesn’t include any additional headers sent by the browser. This value has not been unescaped (decoded), unlike most other variables below.
REQUEST_URI
The path component of the requested URI, such as "/index.html". This variable includes the query variable named QUERY_STRING.
That is, in% {REQUEST_URI} there will always be an absolute path and never a full address.
Try to solve the standard SEO task of adding “www” to a domain without it using mod_rewrite if the user sends the following request:
GET http://domain.name/path/to/resource.html HTTP / 1.1
Host: www.domain.name
At the beginning of the article I asked about the difference in
% {REQUEST_URI} in Apache mod_rewrite from
$ _SERVER ["REQUEST_URI"] in PHP, so I’ll quote from
the PHP documentation :
REQUEST_URI
The URI which was given in order to access this page; for instance, '/index.html'.
Maybe this is somewhere configurable, but my PHP / 5.3.13 returns absoluteURI when I request with the full address.
Practice
Let's now look at what happens when requests are made to real servers. I took the addresses of sites from the
Habr's page (the list is changing there, took at the end of last week). Sketched a small script on Node.JS, in which the http_check function sends single requests, and full_http_check generates several requests to a single server using specific templates.
script codevar net = require('net'); var default_result = function(title) { if (title) { return {'title': 'title', 'step': 'step', 'host': 'host', 'request': 'request', 'header': 'header', 'full_response': 'full_response', 'response': 'response', 'server': 'server', 'length': 'length', 'location': 'location', 'error': 'error'}; } else { return {'title': '', 'step': '', 'host': '', 'request': '', 'header': '', 'full_response': '', 'response': '', 'server': '', 'length': '', 'location': '', 'error': ''}; } }; var format_result = function(result) { return '' + result['title'].toString() + '\t' + result['step'] + '\t' + result['host'] + '\t' + result['request'].toString() + '\t' + result['header'].toString() + '\t' + result['response'].toString() + '\t' + result['server'].toString() + '\t' + result['length'].toString() + '\t' + result['error'].toString() + '\t' + result['location'].toString() + '\t' + result['full_response'].toString(); }; var http_check = function(title, step, host, req, host_hdr) { var host_header = host_hdr || ''; var result = default_result(false); result['title'] = title; result['step'] = step; result['host'] = host; result['request'] = req; result['header'] = host_header; var dat = ''; var client = net.connect({port: 80, host: host}, function() {
Now let's take a closer look at each of the templates and the response of sites.
Request 1
The most common variant of the HTTP / 1.1 request, including the absolute path and the correct Host header. Any server should correctly respond to it, that is, we are waiting for “HTTP / 1.1 200 OK”.
GET /path/to/resource.html HTTP / 1.1
Host: domain.name
All servers returned HTTP / 1.1 200 OK. Below is a table of the “Server” response header values:
Company | Header "Server:" |
---|
Apps4all | nginx / 1.0.15 |
Badoo | nginx |
Box Overview | nginx / 1.2.1 |
Devconf | nginx / 1.0.15 |
e-Legion Ltd. | nginx / 1.0.5 |
Ibm | IBM_HTTP_Server |
Intel | Microsoft-IIS / 7.5 |
JetBrains | nginx |
KolibriOS Project Team | lighttpd / 1.4.32 |
Mail.Ru Group | nginx / 1.2.5 |
Microsoft | Microsoft-IIS / 7.5 |
Opera Software ASA | nginx |
Rusonyx | nginx |
UIDG | Apache |
Zfort group | nginx / 1.4.1 |
VimpelCom (Beeline) | Microsoft-IIS / 7.5 |
Mosigra | nginx / 1.4.1 |
Nordavind | nginx / 1.0.4 |
Yandex | nginx / 1.2.1 |
Request 2
A variant of the first type of request, but instead of the absolute path, we indicate the full address.
GET http://domain.name/path/to/resource.html HTTP / 1.1
Host: domain.name
In response to this request, all servers again showed unanimity. "Light" requests to disassemble each server can.
Request 3
Request for HTTP / 1.0 with absolute path, without “Host:”. Should get "http / 1.0 200 OK".
GET /path/to/resource.html HTTP / 1.0
On the third server request, "fell down." And there is no “HTTP / 1.0 200 OK” response.
Company | Server response |
---|
Apps4all | HTTP / 1.1 301 Moved Permanently |
Badoo | HTTP / 1.1 302 Moved Temporarily |
Box Overview | HTTP / 1.1 200 OK |
Devconf | HTTP / 1.1 404 Not Found |
e-Legion Ltd. | HTTP / 1.1 301 Moved Permanently |
Ibm | HTTP / 1.1 200 OK |
Intel | HTTP / 1.0 400 Bad Request |
JetBrains | HTTP / 1.1 301 Moved Permanently |
KolibriOS Project Team | HTTP / 1.0 404 Not Found |
Mail.Ru Group | HTTP / 1.1 200 OK |
Microsoft | HTTP / 1.1 200 OK |
Opera Software ASA | HTTP / 1.1 404 Not Found |
Rusonyx | HTTP / 1.1 301 Moved Permanently |
UIDG | HTTP / 1.1 404 Not Found |
Zfort group | HTTP / 1.1 404 Not Found |
VimpelCom (Beeline) | HTTP / 1.1 302 Redirect |
Mosigra | HTTP / 1.1 404 Not Found |
Nordavind | HTTP / 1.1 200 OK |
Yandex | HTTP / 1.1 404 Not Found |
Request 4
Previous query, but add “Host:”. It differs from the first request only in the protocol version.
GET /path/to/resource.html HTTP / 1.0
Host: domain.name
Host had a very positive effect on the servers - everyone had the answer “200 OK”, but only the following were HTTP / 1.0: Intel and KolibriOS Project Team.
Request 5
Request for HTTP / 1.0 with full address, without “Host:”. It would be great to read “HTTP / 1.0 200 OK”.
GET http://domain.name/path/to/resource.html HTTP / 1.0
The picture completely coincides with the results of the previous request, but here is e-Legion Ltd. issued HTTP / 1.1 500 INTERNAL SERVER ERROR.
Request 6
Previous query, but add “Host:”. It differs from the second request only in the protocol version.
GET http://domain.name/path/to/resource.html HTTP / 1.0
Host: domain.name
The results completely coincide with the fourth request, that is, “Host:” fixed an internal error on the server of e-Legion Ltd.
Request 7
Variant of the second request with the full address, but in the “Host:” we write the nonexistent subdomain. The request is absolutely correct, so the server must respond with "HTTP / 1.1 200 OK".
GET http://domain.name/path/to/resource.html HTTP / 1.1
Host: void.domain.name
Request 8
Now we will specify a non-existent domain as “Host:”. Nothing has changed in the request, but some servers may not like it.
GET http://domain.name/path/to/resource.html HTTP / 1.1
Host: local.fake
Request 9
The “Host:” title should be completely ignored, so we will write down arbitrary text that many passwords would envy. According to the standard, we will expect “HTTP / 1.1 200 OK”.
GET http://domain.name/path/to/resource.html HTTP / 1.1
Host: l-IjFN = fiG (w + J2p: #. {92! M`d ^?
The server requests 7-9 were answered the same way:
Company | Server response | Header "Server:" |
---|
Apps4all | HTTP / 1.1 200 OK | nginx / 1.0.15 |
Badoo | HTTP / 1.1 200 OK | nginx |
Box Overview | HTTP / 1.1 200 OK | nginx / 1.2.1 |
Devconf | HTTP / 1.1 500 Internal Server Error | nginx / 1.0.15 |
e-Legion Ltd. | HTTP / 1.1 500 INTERNAL SERVER ERROR | nginx / 1.0.5 |
Ibm | HTTP / 1.1 200 OK | IBM_HTTP_Server |
Intel | HTTP / 1.0 400 Bad Request | Akamaihost |
JetBrains | HTTP / 1.1 200 OK | nginx |
KolibriOS Project Team | HTTP / 1.1 200 OK | lighttpd / 1.4.32 |
Mail.Ru Group | HTTP / 1.1 200 OK | nginx / 1.2.5 |
Microsoft | HTTP / 1.1 200 OK | Microsoft-IIS / 7.5 |
Opera Software ASA | HTTP / 1.1 200 OK | nginx |
Rusonyx | HTTP / 1.1 200 OK | nginx |
UIDG | HTTP / 1.1 200 OK | Apache |
Zfort group | HTTP / 1.1 200 OK | nginx / 1.4.1 |
VimpelCom (Beeline) | HTTP / 1.1 200 OK | Microsoft-IIS / 7.5 |
Mosigra | HTTP / 1.1 200 OK | nginx / 1.4.1 |
Nordavind | HTTP / 1.1 200 OK | nginx / 1.0.4 |
Yandex | HTTP / 1.1 200 OK | nginx / 1.2.1 |
Request 10
The first of the wrong requests. Let's send the correct “Host:”, but in the full address we will add a nonexistent subdomain.
GET http://fake.domain.name/path/to/resource.html HTTP / 1.1
Host: domain.name
Since requests started with errors, the results should not be scary.
Company | Server response |
---|
Apps4all | HTTP / 1.1 301 Moved Permanently |
Badoo | HTTP / 1.1 301 Moved Permanently |
Box Overview | HTTP / 1.1 200 OK |
Devconf | HTTP / 1.1 404 Not Found |
e-Legion Ltd. | HTTP / 1.1 301 Moved Permanently |
Ibm | HTTP / 1.1 200 OK |
Intel | HTTP / 1.1 200 OK |
JetBrains | HTTP / 1.1 301 Moved Permanently |
KolibriOS Project Team | HTTP / 1.1 404 Not Found |
Mail.Ru Group | HTTP / 1.1 200 OK |
Microsoft | HTTP / 1.1 200 OK |
Opera Software ASA | HTTP / 1.1 404 Not Found |
Rusonyx | HTTP / 1.1 301 Moved Permanently |
UIDG | HTTP / 1.1 404 Not Found |
Zfort group | HTTP / 1.1 404 Not Found |
VimpelCom (Beeline) | HTTP / 1.1 302 Redirect |
Mosigra | HTTP / 1.1 301 Moved Permanently |
Nordavind | HTTP / 1.1 200 OK |
Yandex | HTTP / 1.1 404 Not Found |
Almost a third of the servers did not waste time trying to suggest the correct path (redirect). Unfortunately, many servers simply redirect to the main page.
Request 11
Now we will try to send a nonexistent domain.
GET http: //local.fake/path/to/resource.html HTTP / 1.1
Host: domain.name
Here, the results completely coincide with the previous request, but Mosigra instead of “HTTP / 1.1 301 Moved Permanently” issued already “HTTP / 1.1 404 Not Found”.
Request 12
Will arbitrary text work as a domain at all?
GET http: // l-IjFN = fiG (w + J2p: #. {92! M`d ^? / Path / to / resource.html HTTP / 1.1
Host: domain.name
The answer "HTTP / 1.1 200 OK" came from Intel and Opera Software ASA. IBM and Mosigra have returned HTTP / 1.1 404 Not Found. All the rest wrote 404 Bad Request, and the part without a header at all (possible option in HTTP / 1.0).
Request 13
A copy of the eleventh request, but also with a subdomain as “Host:”. It hardly makes sense to check other incorrect combinations.
GET http: //local.fake/path/to/resource.html HTTP / 1.1
Host: void.domain.name
The results also became a copy of request 11, but surrendered to Intel and returned an “HTTP / 1.0 400 Bad Request”.
Request 14
The second request, but use the nonexistent protocol when specifying the full address. There must already be a mistake.
GET habr: //domain.name/path/to/resource.html HTTP / 1.1
Host: domain.name
It turned out that quite a few sites perceive the HABR protocol:
Company | Server response |
---|
Apps4all | HTTP / 1.1 200 OK |
Badoo | HTTP / 1.1 200 OK |
Box Overview | HTTP / 1.1 200 OK |
Devconf | HTTP / 1.1 200 OK |
e-Legion Ltd. | HTTP / 1.1 200 OK |
Ibm | HTTP / 1.1 200 OK |
Intel | HTTP / 1.0 400 Bad Request |
JetBrains | HTTP / 1.1 200 OK |
KolibriOS Project Team | HTTP / 1.1 301 Moved Permanently |
Mail.Ru Group | HTTP / 1.1 200 OK |
Microsoft | HTTP / 1.1 400 Bad Request |
Opera Software ASA | HTTP / 1.1 400 BAD_REQUEST |
Rusonyx | HTTP / 1.1 200 OK |
UIDG | HTTP / 1.1 200 OK |
Zfort group | HTTP / 1.1 200 OK |
VimpelCom (Beeline) | HTTP / 1.1 400 Bad Request |
Mosigra | HTTP / 1.1 400 BAD_REQUEST |
Nordavind | HTTP / 1.1 200 OK |
Yandex | HTTP / 1.1 200 OK |
Request 15
Let's try to finally break the resistance of the server and send the previous request, but with an incorrect subdomain.
GET habr: //void.domain.name/path/to/resource.html HTTP / 1.1
Host: domain.name
The results are similar to the tenth query, but there are also changes:
Company | Request 10 | Request 15 |
---|
Apps4all | HTTP / 1.1 301 Moved Permanently | HTTP / 1.1 301 Moved Permanently |
Badoo | HTTP / 1.1 301 Moved Permanently | HTTP / 1.1 301 Moved Permanently |
Box Overview | HTTP / 1.1 200 OK | HTTP / 1.1 200 OK |
Devconf | HTTP / 1.1 404 Not Found | HTTP / 1.1 404 Not Found |
e-Legion Ltd. | HTTP / 1.1 301 Moved Permanently | HTTP / 1.1 301 Moved Permanently |
Ibm | HTTP / 1.1 200 OK | HTTP / 1.1 200 OK |
Intel | HTTP / 1.1 200 OK | HTTP / 1.0 400 Bad Request |
JetBrains | HTTP / 1.1 301 Moved Permanently | HTTP / 1.1 301 Moved Permanently |
KolibriOS Project Team | HTTP / 1.1 404 Not Found | HTTP / 1.1 301 Moved Permanently |
Mail.Ru Group | HTTP / 1.1 200 OK | HTTP / 1.1 200 OK |
Microsoft | HTTP / 1.1 200 OK | HTTP / 1.1 400 Bad Request |
Opera Software ASA | HTTP / 1.1 404 Not Found | HTTP / 1.1 400 BAD_REQUEST |
Rusonyx | HTTP / 1.1 301 Moved Permanently | HTTP / 1.1 301 Moved Permanently |
UIDG | HTTP / 1.1 404 Not Found | HTTP / 1.1 404 Not Found |
Zfort group | HTTP / 1.1 404 Not Found | HTTP / 1.1 404 Not Found |
VimpelCom (Beeline) | HTTP / 1.1 302 Redirect | HTTP / 1.1 400 Bad Request |
Mosigra | HTTP / 1.1 301 Moved Permanently | HTTP / 1.1 400 BAD_REQUEST |
Nordavind | HTTP / 1.1 200 OK | HTTP / 1.1 200 OK |
Yandex | HTTP / 1.1 404 Not Found | HTTP / 1.1 404 Not Found |
Request 16
Let's try to use an arbitrary domain.
GET habr: //local.fake/path/to/resource.html HTTP / 1.1
Host: domain.name
The results matched the previous query.
Request 17
And for the third time we will try to replace the domain with arbitrary text.
GET habr: // l-IjFN = fiG (w + J2p: #. {92! M`d ^? / Path / to / resource.html HTTP / 1.1
Host: domain.name
Already no positive response from the server. Compared to request 12, the following sites have changes:
Company | Request 12 | Request 17 |
---|
Intel | HTTP / 1.1 200 OK | HTTP / 1.0 400 Bad Request |
KolibriOS Project Team | HTTP / 1.1 400 Bad Request | HTTP / 1.1 301 Moved Permanently |
Opera Software ASA | HTTP / 1.1 200 OK | HTTP / 1.1 400 BAD_REQUEST |
Mosigra | HTTP / 1.1 404 Not Found | HTTP / 1.1 400 BAD_REQUEST |
Request 18
And now let's try to get rid of the correct “Host:” header.
GET habr: // l-IjFN = fiG (w + J2p: #. {92! M`d ^? / Path / to / resource.html HTTP / 1.1
Host: local.fake
Only one change from the previous result - the KolibriOS Project Team server began to return “HTTP / 1.1 404 Not Found” instead of “HTTP / 1.1 301 Moved Permanently”.
Query N
Write if you want to try any other query options. And you can do it yourself.
Conclusion
Let's try to sum up some results. Almost all the servers reviewed correctly responded to HTTP / 1.1 requests. DevConf, e-Legion Ltd. made an exception. and Intel. The first two use nginx, so the problem is most likely precisely in its configuration. Intel uses AkamaiGHost, which is either configured incorrectly or does not support HTTP / 1.1. I admit that one of the reasons for passing tests correctly is nginx (14 out of 19 servers used it). Due to the difference in versions, a chain of nginx / 1.0.10 and nginx / 1.4.1 was discovered in the UIDG.
You think that everything is simple? Try setting up Apache with SEO in mind so that it correctly processes requests with an erroneous “Host:” and is based only on the full address in the query string.
What is the practical meaning of the "wrong" correct requests? I doubt it will be possible to find any vulnerability. But has it really been possible for almost fifteen years that no one has learned to create correct HTTP / 1.1 servers?
PS Remember the differences between
% {REQUEST_URI} in Apache mod_rewrite and
$ _SERVER ["REQUEST_URI"] in PHP.
UPD1:Request 19
On the advice of
AEP, he took the second request, but added another zero byte and some string to the host. It depended on how well the server would ignore a host with a zero byte.
GET http://domain.name/path/to/resource.html HTTP / 1.1
Host: domain.name {zero byte} fake_and_void
Added the following template to the script:
http_check(title, '19', parts[1], 'GET http://' + parts[1] + parts[2] + ' HTTP/1.1', parts[1] + '\0fake_and_void_text');
All servers returned “HTTP / 1.1 400 Bad Request”, except IBM, Opera Software ASA and Mosigra.
When I tried to add a zero byte to the request, then apart from IBM and Opera Software, everyone reported error 400.