📜 ⬆️ ⬇️

Debugging the rules of RewriteRule, or a little about the intimate life of mod_rewrite

My RewriteEngine has always been a pretty stressful topic. Only recently I suddenly discovered that everything had somehow settled down and became more or less clear. Since I’m a completely ordinary person, I’m sure that the situation of a web server configuration error was “not getting” not just me, and I’m in a hurry to share my experience.

It turned out to be a cross between a guide to using the mod_rewrite module and a kind of reference for configuring a web server using the .htaccess file. Along the way, I would like to focus on particularly difficult or non-obvious points.

It is assumed that the reader uses URL-rewriting in his work, he knows in general terms what RewriteEngine is and has already spent several hours setting it up. This article is not quite for beginners, but not for super-pros, of course.
')

Baseline Data for Experiments




Configuring virtual hosts


In order to make it easier to work, to debug and not to wag one's nerves where it is possible not to do this, it would be good to configure virtual hosts in terms of convenience. Consider the simplest settings that greatly facilitate our lives.

We adjust logs

Few people on the local server only one domain. Domains usually a lot. It would be nice to divide the logs by domain and by day, so that they do not grow too much. This is done through the <VirtualHost> section of our server settings.

For the error log on our domain, add the following two lines.

ErrorLog "|/opt/lampp/bin/rotatelogs /opt/lampp/logs/engine-bbb-error.%Y.%m.%d.log 86400" LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined 

The first one sets the name of the error log for the virtual server and makes it necessary to start a new log every 86400 seconds. Rotatelogs is a program that is generally included with the Apache web server, and I hope that you also have it installed.

The second line sets the format of each line of the error log. Details can be found in the Apache server documentation. Everything is pretty clear there. For this article, it's important to just keep in mind that the format is customizable.

For the access log, I only include one line. The default string format usually suits me.

  CustomLog "|/opt/lampp/bin/rotatelogs /opt/lampp/logs/engine-bbb-access.%Y.%m.%d.log 86400" combined 

In both cases, pay attention to the paths to the web server programs and to the logs. You must install the paths that exist on your computer.

The most general information about how everything works


Suppose we want to include an address rewriting service in some folder of our domain. To do this, we have the line

 RewriteEngine on 

in the .htaccess file that will manage this folder. In addition, we have below this line other directives and a few rules for rewriting.

Suppose a server receives an url as input. RewriteEngine begins to check this URL using rules. He does it top down in order. If the input URL does not satisfy any rule, then it is what is called “passes”. So, for example, suppose that we have the index.php file in the root folder. If the input uri "/index.php" does not satisfy any rule, then we will see in the browser the result of this script.

If we have the following rule

 RewriteRule ^index\.php$ / [L] 

then, obviously, this rule for ur "index.php" will work. In this case, the Uri will be rewritten to "" and the new Uri "/" will be sent to the server as input. And the whole process of applying the rules will go again. Only if uri "/" does not satisfy any of our rules, we will see what we want. And if it satisfies, it will be rewritten again and everything will be repeated anew.

How does the flag [L]


This flag probably introduces a lot of misunderstandings. The presence of a flag prevents the input uri from being checked for the rules following it if this rule has worked. That’s all. That is, if our “index.php” passed the test (the rule worked for it), then due to the presence of the [L] flag, we interrupt all subsequent checks, and the web server immediately rewrites “index.php” -> "" and receives to the input uri "/" ([INTERNAL REDIRECT]), and everything repeats from the beginning, from the first rule. And if this flag is not present, then the rewriting still happens and the check continues with the next rule. But uri will be already changed, namely "/".

Understanding this process immediately prevents many cyclical redirects.

But excuse me, does the written above mean that if you do not use the [L] flag, we will save time and the page will open faster? We stumble upon the flag [L] and have to go through all the rules without exception, and if we don’t set the flag [L], then we will rewrite the rule that has been triggered, go to the end of all the rules and finish it?

I checked. It does not work. In the absence of flags [L], the module, as expected, replaces the uri on the triggered rule, follows all the remaining rules to the end, then produces [INTERNAL REDIRECT] and still passes all the rules with this uri again. That is confirmed by what we wrote above. This rule does not seem to have exceptions.

Conclusion: whenever the RewriteRule is triggered, [INTERNAL REDIRECT] occurs and all rules are reapplied. This second pass begins either immediately after applying the rule with the [L] flag, or after all the rules run out, if we work without the [L] flags. The situation of the “passage” of url, and it is called “pass through” can occur only if no rule has been applied. The [L] flag can actually reduce the Uri processing time and should be used wherever possible.

What is RewriteBase?


This instruction, in my opinion, is just a record holder for incomprehensibility! I would give her a prize for it! In view of this, I have two stories about this beast - short and long. A short story for those who do not want to bother with this instruction. Long for interested.

Short story

If you are doing relatively simple URL rewriting using .htaccess files, I recommend that you always proceed as follows.



Long story

When rewriting, the following processes will occur:

We have:

We request url engine.bbb.ru/ind.php

If there is no .htaccess file in the / opt / lampp / htdocs / bbb / _engine / local folder or there is no RewriteEngine


If the / opt / lampp / htdocs / bbb / _engine / local folder contains a .htaccess file and RewriteEngine is included

Attention! Such an algorithm will always be executed. This algorithm expresses the specifics of the term " per-dir ", that is, the " per-director " approach embedded in the Apache server. The value of the RewriteBase directive does not affect it (the algorithm).

What is the effect of the RewriteBase directive?

It must be very well remembered that the URL directive is indicated in the RewriteBase directive! It is impossible to specify there " local / " There will be an error! You can only " / local ".

Let us specify in our /opt/lampp/htdocs/bbb/_engine/local/.htaccess

 RewriteBase /local 

We request url engine.bbb.ru/local/

Then the rule

 RewriteRule ^$ ind1.php 

It will work! And the transition to uri will be made / local /ind1.php

And the rule

 RewriteRule ^$ /ind1.php 

it will also work, but the transition will be made to uri /ind1.php . File not found! We have no such uri (relative to the root of the site)!

Conclusion 1: The URL that we specify in the RewriteBase is added as a prefix to the target Uri in the event that it is relative, that is, there is no slash at the beginning.

Conclusion 2: If we never use relative target uri in the rules, then we don’t need the RewriteBase directive!

Conclusion 3: If we use "RewriteBase /", then when the rule is triggered

 RewriteRule ^$ ind1.php 

There will be an attempt to go to uri /ind1.php. We simply use "/" as a prefix.

Another experience (hooligan)
We have the following RewriteEngine rules in the root .htaccess:

 RewriteEngine on RewriteRule ^$ ind.php 

We request with this engine.bbb.ru

If RewriteBase is url, then let's install

 RewriteBase http://bbb.ru 

Not. Does not pass. Error " RewriteBase: argument is not a valid URL ". Strange, right? But we do not give up! Change RewriteBase!

 RewriteBase //bbb.ru 

In this case, there is no error! What happens with the paths? A lot of interesting!
The server honestly gets the path / opt / lampp / htdocs / bbb / _engine / , removes the prefix / opt / lampp / htdocs / bbb / _engine / from it and works with an empty string ('').
We run into the rule and change the empty string to 'ind.php'
Honestly add the prefix " //bbb.ru " and go to the next pass. This second pass is equivalent to calling engine.bbb.ru//bbb.ru/ind.php , which, by and large, is not what we wanted (there was an initial desire to jump to another site). In short, the idea did not justify itself. As a result, we have an error 404, which is logical. By the way, " // " were in the process of rewriting replaced by the server with " / ". The trace of this example is given significantly below.


How did I get all this breathtaking information about the intimate life of the Apache server? Or finally debugging


Really! How did I see the errors that the renaming service gives? After all, this is debugging! There is a very useful directive that I put in virtual hosts for the domain engine.bbb.ru . Namely

  LogLevel warn rewrite:trace4 

After insertion, I restarted Apache. And from this point on, the domain error log, namely, the /opt/lampp/logs/engine-bbb-error.2015.08.08.log file began to insert trace lines related to the rewrite module. Lines a lot. Why trace4 ? Maybe you can insert trace3 ? Can. But then it will not be possible to debug RewriteCond, there will not be detailed information as with which pattern we compare and information about some other events will disappear (not so important as interesting).

What is " warn "? Literally, our LogLevel entry means that for all modules, the level of warn errors is and only for the rewrite module - trace4

What do we get by turning on debugging?

We get a trace, or a very, very detailed log. There are really a lot of trace lines. If I get into a complicated plug-in with the rules, and after some time of torment, I fail, and I decide to turn on tracing, then I disable all the rules in my .htaccess that do not apply to the subject URL. I put in front of them the comment sign "#". After that, I reload the page that does not work and I try to find the necessary lines in the log.

I present the rewriting trace with the following conditions:

We request:
  http://engine.bbb.ru/ 

Rules:

 RewriteEngine on RewriteRule ^$ /ind.php [L] 


 [Sat Aug 08 15:41:38.664920 2015] [rewrite:trace3] [pid 21776] mod_rewrite.c(475): [client 127.0.0.1:45382] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#dd7890/initial] [perdir /opt/lampp/htdocs/bbb/_engine/] strip per-dir prefix: /opt/lampp/htdocs/bbb/_engine/ -> [Sat Aug 08 15:41:38.664955 2015] [rewrite:trace3] [pid 21776] mod_rewrite.c(475): [client 127.0.0.1:45382] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#dd7890/initial] [perdir /opt/lampp/htdocs/bbb/_engine/] applying pattern '^$' to uri '' [Sat Aug 08 15:41:38.664960 2015] [rewrite:trace2] [pid 21776] mod_rewrite.c(475): [client 127.0.0.1:45382] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#dd7890/initial] [perdir /opt/lampp/htdocs/bbb/_engine/] rewrite '' -> '/ind.php' [Sat Aug 08 15:41:38.664966 2015] [rewrite:trace1] [pid 21776] mod_rewrite.c(475): [client 127.0.0.1:45382] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#dd7890/initial] [perdir /opt/lampp/htdocs/bbb/_engine/] internal redirect with /ind.php [INTERNAL REDIRECT] [Sat Aug 08 15:41:38.665040 2015] [rewrite:trace3] [pid 21776] mod_rewrite.c(475): [client 127.0.0.1:45382] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#dde8b8/initial/redir#1] [perdir /opt/lampp/htdocs/bbb/_engine/] strip per-dir prefix: /opt/lampp/htdocs/bbb/_engine/ind.php -> ind.php [Sat Aug 08 15:41:38.665044 2015] [rewrite:trace3] [pid 21776] mod_rewrite.c(475): [client 127.0.0.1:45382] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#dde8b8/initial/redir#1] [perdir /opt/lampp/htdocs/bbb/_engine/] applying pattern '^$' to uri 'ind.php' [Sat Aug 08 15:41:38.665046 2015] [rewrite:trace1] [pid 21776] mod_rewrite.c(475): [client 127.0.0.1:45382] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#dde8b8/initial/redir#1] [perdir /opt/lampp/htdocs/bbb/_engine/] pass through /opt/lampp/htdocs/bbb/_engine/ind.php 


Tracing the hooligan example from the long history section
 [Sat Aug 08 15:09:37.475389 2015] [rewrite:trace3] [pid 21775] mod_rewrite.c(475): [client 127.0.0.1:45327] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#de2740/initial] [perdir /opt/lampp/htdocs/bbb/_engine/] strip per-dir prefix: /opt/lampp/htdocs/bbb/_engine/ -> [Sat Aug 08 15:09:37.475406 2015] [rewrite:trace3] [pid 21775] mod_rewrite.c(475): [client 127.0.0.1:45327] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#de2740/initial] [perdir /opt/lampp/htdocs/bbb/_engine/] applying pattern '^$' to uri '' [Sat Aug 08 15:09:37.475411 2015] [rewrite:trace2] [pid 21775] mod_rewrite.c(475): [client 127.0.0.1:45327] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#de2740/initial] [perdir /opt/lampp/htdocs/bbb/_engine/] rewrite '' -> 'ind.php' [Sat Aug 08 15:09:37.475414 2015] [rewrite:trace3] [pid 21775] mod_rewrite.c(475): [client 127.0.0.1:45327] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#de2740/initial] [perdir /opt/lampp/htdocs/bbb/_engine/] add per-dir prefix: ind.php -> /opt/lampp/htdocs/bbb/_engine/ind.php [Sat Aug 08 15:09:37.475418 2015] [rewrite:trace2] [pid 21775] mod_rewrite.c(475): [client 127.0.0.1:45327] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#de2740/initial] [perdir /opt/lampp/htdocs/bbb/_engine/] trying to replace prefix /opt/lampp/htdocs/bbb/_engine/ with //bbb.ru [Sat Aug 08 15:09:37.475420 2015] [rewrite:trace4] [pid 21775] mod_rewrite.c(475): [client 127.0.0.1:45327] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#de2740/initial] add subst prefix: ind.php -> //bbb.ru/ind.php [Sat Aug 08 15:09:37.475422 2015] [rewrite:trace1] [pid 21775] mod_rewrite.c(475): [client 127.0.0.1:45327] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#de2740/initial] [perdir /opt/lampp/htdocs/bbb/_engine/] internal redirect with //bbb.ru/ind.php [INTERNAL REDIRECT] [Sat Aug 08 15:09:37.475469 2015] [rewrite:trace3] [pid 21775] mod_rewrite.c(475): [client 127.0.0.1:45327] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#dd8dc8/initial/redir#1] [perdir /opt/lampp/htdocs/bbb/_engine/] add path info postfix: /opt/lampp/htdocs/bbb/_engine/bbb.ru -> /opt/lampp/htdocs/bbb/_engine/bbb.ru/ind.php [Sat Aug 08 15:09:37.475473 2015] [rewrite:trace3] [pid 21775] mod_rewrite.c(475): [client 127.0.0.1:45327] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#dd8dc8/initial/redir#1] [perdir /opt/lampp/htdocs/bbb/_engine/] strip per-dir prefix: /opt/lampp/htdocs/bbb/_engine/bbb.ru/ind.php -> bbb.ru/ind.php [Sat Aug 08 15:09:37.475476 2015] [rewrite:trace3] [pid 21775] mod_rewrite.c(475): [client 127.0.0.1:45327] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#dd8dc8/initial/redir#1] [perdir /opt/lampp/htdocs/bbb/_engine/] applying pattern '^$' to uri 'bbb.ru/ind.php' [Sat Aug 08 15:09:37.475478 2015] [rewrite:trace1] [pid 21775] mod_rewrite.c(475): [client 127.0.0.1:45327] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#dd8dc8/initial/redir#1] [perdir /opt/lampp/htdocs/bbb/_engine/] pass through /opt/lampp/htdocs/bbb/_engine/bbb.ru 



Note that each line is marked with a level indication ( rewrite: trace_ ). Apparently, if you find in the log some one line you need, and you want to see only the same type, then you change the trace level, restart Apache and repeat the operation. It seems to me personally that this way is not quite an easy task. It is much easier, in my opinion, to first copy the lines into a separate file, focusing only on the time of the operation (by minutes). Then separate the other necessary lines from them by removing unnecessary information (search-replace). At first I even thought of making a PHP tool for viewing logs of this kind. But then the need disappeared by itself (I’ll dwell on this below).

Debugging is valid for the virtual host for which it is specified.

If the engine.bbb.ru domain uses external css styles, which are taken from the bbb.ru domain, and this is the problem, then you do not need to enable debugging within the virtual server engine.bbb.ru, but you need to enable it in the virtual server bbb.ru . Then all calls to the bbb.ru domain should be viewed in the error logs (not access!) Of the bbb.ru domain. At the same time, calls to the traced objects will not be in the access logs at all!

And you can not use such a stressful RewriteEngine at all?


You can switch to using just one script for the entire site and re-write all of it. In PHP, this is easier to do, and debugging is much easier. In addition to the obvious advantages in terms of site security, we get the convenience of rewriting without hassle. In order to switch to such a scheme of work, our .htaccess should be something like this:

 RewriteEngine on #   " www"  " www" RewriteCond %{HTTP_HOST} ^www\.our-site\.ru$ RewriteRule ^(.*)$ http://our-site.ru/$1 [R=301,L] #  4  ,    . RewriteCond %{REQUEST_URI} !favicon\.ico$ RewriteCond %{REQUEST_URI} !robots\.txt$ RewriteCond %{REQUEST_URI} !sitemap\.xml$ RewriteCond %{REQUEST_URI} !^/dispatch\.php$ RewriteRule ^.*$ /dispatch.php [L] 


And in the dispatch.php script, I strongly advise you not to forget to prohibit direct calling of the dispatch.php itself.

 <?php if (preg_match('#^/dispatch.php#', $_SERVER['REQUEST_URI']) == 1) { redirect_to_bad_uri(); } 


If you suddenly want to adopt this approach, then I recommend calling the dispatch.php script something else. I used this name only for clarity.

By the way, this approach is being implemented quite actively. To this we should be grateful for the introduction of CNC (URLs that are understandable to humans, although for me personally they are very incomprehensible). Practically in all modern engines it already operates.

The virtual server section engine.bbb.ru that I used
 <VirtualHost 127.0.0.9:80> ServerAdmin webmaster@serv1.ru DocumentRoot "/opt/lampp/htdocs/bbb/_engine" ServerName "engine.bbb.ru" ServerAlias "www.engine.bbb.ru" ScriptAlias /cgi/ "/opt/lampp/cgi-bin/" ScriptAlias /cgi-bin/ "/opt/lampp/cgi-bin/" ErrorLog "|/opt/lampp/bin/rotatelogs /opt/lampp/logs/engine-bbb-error.%Y.%m.%d.log 86400" LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined CustomLog "|/opt/lampp/bin/rotatelogs /opt/lampp/logs/engine-bbb-access.%Y.%m.%d.log 86400" combined #      . # LogLevel warn rewrite:trace4 </VirtualHost> 

Source: https://habr.com/ru/post/264395/


All Articles