📜 ⬆️ ⬇️

How mod_rewrite actually works. A guide for continuing

image
This article has grown out of the idea of ​​advanced training for our technical support staff working with mod_rewrite. Practice has shown that after studying a large number of textbooks in Russian support, the solution of patterned problems is well given, but independent rule making is done by trial and a lot of errors. The problem is that for a good understanding of the work of mod_rewrite requires the study of the original English-language documentation, after which - either additional clarification, or hours of experiments with RewriteLog.

The article outlines the mod_rewrite working mechanism. Understanding the principles of its work allows you to clearly understand the effect of each directive and clearly understand what is happening at one time or another inside mod_rewrite when processing directives.

I assume that the reader is already familiar with what mod_rewrite is, and I will not describe its basics, which are easy to find on the Internet. It should also be noted that the article highlights the work of mod_rewrite when using its directives in the .htaccess file. Differences when working in the context of <VirtualHost> are outlined at the end of the article.
')
So, you studied mod_rewrite, made up a few RewriteRule and managed to face endless redirects, the case when the rule for some reason does not catch your request, as well as the unpredictable work of the rule group when the subsequent rule unexpectedly changes the request that the previous rules painstakingly prepared.

Why it happens?

What does RewriteRule work with


The first RewriteRule is passed the path from the place where the .htaccess is located to the requested file. This line never starts with "/". The subsequent RewriteRule is passed the result of previous conversions.

In order to thoroughly understand how a RewriteRule works, you must first determine what it works with . Consider how Apache gets a string that is initially passed to a RewriteRule in .htaccess.

When you first start working with mod_rewrite, it is logical to assume that it works with links. However, this is not the case with mod_rewrite in .htaccess. In fact, the path to the requested file is not transferred to the RewriteRule.

Due to the internal Apache architecture, at the moment when .htaccess comes into play, mod_rewrite can only operate on the path to the file to be processed. This is due to the fact that before the transfer to the mod_rewrite request, other modules could already be changed (for example, mod_alias), and the final path to the file on the site may not coincide with the original link. If mod_rewrite worked with the original link, it would violate the action of the modules that modified the request before it.

Therefore, the absolute path to the file to be processed is transferred to mod_rewrite. Also, mod_rewrite knows the path to .htaccess, which contains the RewriteRule rules. To make something like a link from the path to the file that the site developer plans to work with, mod_rewrite cuts a part from the absolute path to the .htaccess file.

So, it is this path, from which the path to .htaccess is cut off, is transmitted to the first RewriteRule. For example:
How does the RewriteRule work?

The path to .htaccess is cut off with a slash. From this there is a consequence: the line that is initially passed to the processing of the RewriteRule never starts with "/".

It is important to remember that it does not do RewriteRule. It does not process the name of the site, the arguments that are passed to the script, and the link does not process everything, if .htaccess is not located in the root of the site. RewriteCond is doing all this, which will be briefly touched upon later. So:

# - /
RewriteRule ^/index.php$ /my-index.php

# - RewriteRule
RewriteRule ^example.com/.* http://www.example.com

# - RewriteRule
RewriteRule index.php\?newspage=([0-9]+) news.php?page=$1

# .htaccess , templates,
# , . , .htaccess templates/.htaccess ,
# , mod_rewrite .htaccess RewriteRule
# "templates/"
RewriteRule ^templates/common/yandex-money.gif$ templates/shared/yad.gif


At the beginning of using mod_rewrite, I recommend working with it only in .htaccess in the root of the site. This will somewhat simplify the monitoring of its work.

With what RewriteRule works, we have understood. Now let's see how it works.

How RewriteRule Works


RewriteRule simply converts the string to regular expressions, that's all. RewriteRule works with a string, not with a link or path to the file.

As we found out above, the path from .htaccess to the requested file falls into the input of the RewriteRule. The most convenient way now is to abstract from the paths and references and treat what the RewriteRule works with as a regular line . This line is passed from RewriteRule to RewriteRule, modifying if any of the RewriteRule worked.

In general, if we exclude the difficulties of using flags (some of which will be discussed below) and the difficulty of constructing regular expressions (which we will hardly touch on in this article), RewriteRule works VERY simply.
  1. They took a string.
  2. Compared with a regular expression in the first argument.
  3. If there is a match, replace the entire line with the value of the second argument.
  4. Transferred the string to the next RewriteRule.
Here, in general, and all. To illustrate that the RewriteRule works with a string , consider the following fantastic example:

# : http://mysite.com/info.html
# RewriteRule "info.html"

# .
RewriteRule ^info.html$ "I saw a turtle in the hole. And it was dancing rock-n-roll. And it was smiling. All in all, it was a very funny doll."

# "info.html" -> "I saw a turtle..."

# .
RewriteRule turtle https://example.com/information/index.html

# "I saw a turtle..." -> "https://example.com/information/index.html"

# !
RewriteRule ^(.*)example.com(.*)$ $1example.org$2

# "https://example.com/information/index.html" -> "https://example.org/information/index.html"

# !
RewriteRule ^https:(.*)$ ftp:$1

# "https://example.org/information/index.html" -> "ftp://example.org/information/index.html"

# .
RewriteRule ^(.*)/index.html$ $1/main.php

# "ftp://example.org/information/index.html" -> "ftp://example.org/information/main.php"


As you can see, the RewriteRule doesn't care what to work with - it simply converts the string according to the arguments given to it. If you want, you can store any data arrays in the string, if you wish, perseverance and good knowledge of regular expressions, you can at least write tic-tac-toe on RewriteRule.

Here it is necessary to make a remark: even though RewriteRule works with a clean line, it is still oriented to work with links. Therefore, it will react in a special way to lines beginning with “https: //” or analogs (remember that we wanted to make an external redirect) and to the “?” (consider the following characters as arguments that will need to be substituted for the request). But now it doesn’t interest us - it’s important to understand that there’s no magic in the RewriteRule — it just takes the string and changes it as you told it. We will look at external redirects and arguments later in the article; there are also some things to talk about.

After all conversions are performed and the last RewriteRule is executed, the RewriteBase takes effect.

What is RewriteBase for?


If the query after the conversion is relative and different from the original one, the RewriteBase will add itself to it on the left. It is necessary to specify RewriteBase in .htaccess. Its value is the path from the site root to .htaccess.
RewriteBase is performed only after all of the RewriteRule, and not between them.

We have already said above that in mod_rewrite, working in .htaccess, the absolute path to the requested file falls. To pass it to the RewriteRule, mod_rewrite cuts the path to .htaccess. Then the RewriteRule rules change the request one by one. And after the request is changed, Apache should restore the absolute path to the file, which it should eventually process. RewriteBase is actually a hack that helps restore the original path to the file.

RewriteBase is executed after all conversions. This means that it will not change the request between the RewriteRule, and will take effect only when all of the RewriteRule have worked.

After all the transformations, the RewriteBase looks, the relative one turned out to be a path or an absolute one. In the context of Apache, we mean a relative or absolute path, counting from the root of the site:
If the path is absolute, RewriteBase does nothing. And if relative - RewriteBase appends itself to the left. This works for both internal and external redirects:

# .htaccess /images/
# RewriteBase /images/
RewriteBase /images/

# http://example.com/images/logo.gif
# RewriteRule "logo.gif"
RewriteRule ^logo.gif$ logo-orange.gif
# RewriteRule: "logo.gif" -> "logo-orange.gif"
# RewriteBase: "logo-orange.gif" -> "/images/logo-orange.gif"

# http://example.com/images/header.png
# RewriteRule "header.png"
RewriteRule ^header.png$ /templates/rebranding/header.png
# RewriteRule: "header.png" -> "/templates/rebranding/header.png"
# RewriteBase: , "/'.

# http://example.com/images/director.tiff
# RewriteRule "director.tiff"
#
RewriteRule ^director.tiff$ staff/manager/director.tiff [R=301]
# RewriteRule: "director.tiff" -> "staff/manager/director.tiff"
# + mod_rewrite ,
# RewriteBase: "staff/manager/director.tiff" -> "/images/staff/manager/director.tiff"
# mod_rewrite :
# "/images/staff/manager/director.tiff" -> http://example.com/images/staff/manager/director.tiff


Usually, after some familiarity with mod_rewrite, the following habit develops: 1) add “RewriteBase /” to each .htaccess, 2) start all redirections with a slash: “RewriteRule news.php /index.php?act=news”. This helps get rid of the artifacts of RewriteBase, but doing so is wrong. Now that we know what the RewriteBase is doing, we can formulate the following correct rules:
  1. RewriteBase must match the path from the site root to .htaccess.
  2. Starting redirects with "/" is necessary only when you need to specify the absolute path from the root of the site to the file.

how does rewritebase work

What happens if you do not specify RewriteBase? By default, Apache makes it equal to the absolute path on the file system before .htaccess (for example, /var/www/example.com/templates/). The incorrectness of this assumption Apache manifests itself in external relative redirects:
# http://example.com/index.php
# DocumentRoot: /var/www/example.com/
# .htaccess , RewriteBase.
# RewriteBase .htaccess: /var/www/example.com/

# RewriteRule - "index.php"
RewriteRule ^index.php main.php [R]
# : "index.php" -> "main.php"
# mod_rewrite ,

# RewriteRule
# mod_rewrite RewriteBase, .
# : "main.php" -> "/var/www/example.com/main.php"

# mod_rewrite , :
# "/var/www/example.com/main.php" -> http://example.com/var/www/example.com/main.php

# , .


So, the request went through all the RewriteRule, after which a RewriteBase was added to it, if necessary. Should Apache now give the file, which shows the resulting path? Not. Now the resulting request will be processed again.

How mod_rewrite works Flag [L]


mod_rewrite starts processing the request again and again, until it stops changing. And the flag [L] cannot stop it.

When compiling more or less complex mod_rewrite configurations, it is important to understand that changing the query does not end at the last RewriteRule . After the last rule of RewriteRule worked and the RewriteBase was added, mod_rewrite looks at whether the request has changed or not. If the request is changed, its processing begins anew from the beginning of .htaccess.

Apache does this because it could be redirected to another directory during the request change process. It may have its own .htaccess, which was not involved in the previous processing of the request. In the same new .htaccess there may be rules that affect the processing of a request - both the mod_rewrite rules and the rules of other modules. To correctly handle this situation, Apache must restart the entire processing cycle.

- Wait, but there is a flag [L] that stops the processing of the request by mod_rewrite'om!

Not certainly in that way. The [L] flag stops the current iteration of the request processing. However, if the request was changed by those RewriteRule, which still managed to work out, Apache will start the request processing cycle again from the first RewriteRule.
# : http://example.com/a.html

RewriteBase /

RewriteRule ^a.html$ b.html [L]
RewriteRule ^b.html$ a.html [L]


The example above will result in an endless loop of redirections and a “Internal Server Error” in the end. In this example, the infinite loop is obvious, but in more complex configurations it may be necessary to delve into the rules to determine which queries loop around each other.

To avoid such situations, it is recommended to use the [L] flag only when necessary. Necessity can be of two types:
  1. When external redirect is used - [L, R = 301] or [L, R = 302]. In the case of an external redirect, further processing of the request is undesirable (see below about the [R] flag), and it is better to stop it.
  2. When in .htaccess there is a looping that cannot be eliminated, and the processing of the request by mod_rewrite should be forcibly stopped. In this case, a special design is used - see tips on this topic at the end of the article.

But the example below will not loop. Try to determine why, and as a result, the file will be given to Apache.
# : http://example.com/a.html
# .htaccess

RewriteBase /
RewriteRule ^a.html$ b.html
RewriteRule ^b.html$ a.html

# .htaccess


The answer: As a result of all RewriteRule execution, the request is changed in such a way that the final result is equal to the original one . Apache sees this and does not initiate repeated processing of the request . The a.html file will be returned .

How mod_rewrite works Flag [R]


The [R] flag does not stop processing the request, immediately returning an external redirect. Instead, it remembers the need for an external redirect, and the request processing continues with the following RewriteRule. It is recommended to always use with the [L] flag.

The [R] flag tells Apache to execute external redirect, not internal. What is the difference between external redirect from internal? Internal redirect simply changes the path to the file that will be given to the user, while the user thinks that he receives the file that he originally requested. When an external redirect Apache instead of the contents of the file returns to the user the status of the response 301 or 302 and reports the link on which the browser should refer to the file.

It would seem that when processing the [R] flag, Apache should immediately stop processing the RewriteRule and return the external redirect to the user. However, let's recall a fantastic example from the section “How RewriteRule Works”. In it, we first indicated the flag [R], denoting the need for an external redirect, and then continued to change the link following RewriteRule.

This is exactly how Apache works when specifying an external redirect. He simply "marks" himself that after the fulfillment of all the rules it is necessary to return the status 302 (by default), but he continues to perform all the RewriteRule further down the list. We can continue to change the request as we need, the only thing that does not work out is to make the redirect back internal.

However, it is unlikely that you want to change it after any external redirect. Therefore it is recommended when using the [R] flag to indicate it together with [L]:

# BlackJack
RewriteRule ^bj/(.*) blackjack/$1 [R=301,L]

#
RewriteRule ^bj/(.*) http://blackjack.example.com/$1 [L]


Instead of using the [R] flag, you can simply specify an external link. In this case, Apache itself will guess that it is necessary to make an external redirect. Here, as with the case with the explicit indication of the flag [R], it is recommended to use the flag [L].

How mod_rewrite works Specifying Query Parameters and the [QSA] Flag


Changing the request parameters in the RewriteRule does not change the row with which the next RewriteRule works. However, when parameters are changed, the variable% {QUERY_STRING}, with which RewriteCond can work, changes.

Terminology used: “parameters” - request parameters, “arguments” - arguments RewriteRule.

Using the RewriteRule, you can change not only the path to the file to be processed, but also the GET request parameters that will be passed to it. This is often used to transfer CNC processing to a common script processor, for example:
RewriteBase /

# : http://example.com/news/2010/07/12/grand-opening.html
# : "news/2010/07/12/grand-opening.html"
RewriteRule ^news/(.*)$ index.php?act=news&what=$1
# RewriteRule: "news/2010/07/12/grand-opening.html" -> "index.php"
# %{QUERY_STRING}: "" -> "act=news&what=2010/07/12/grand-opening.html"


At the moment when the RewriteRule rule encounters a question mark in the second argument, it understands that there is a change in the parameters in the request. The result is the following:
  1. RewriteRule replaces the line it works with with part of the second argument before the question mark . Please note that the new request parameters do not fall into the line with which the subsequent RewriteRule rules will work.
  2. The part of the second argument after the question mark is in the variable% {QUERY_STRING}. If the [QSA] flag was specified, the query parameters will be added to the beginning of% {QUERY_STRING}. If the flag is not specified,% {QUERY_STRING} is completely replaced by the request parameters from the RewriteRule.
A couple more examples:
RewriteBase /

# : http://example.com/news/2010/?page=2
# RewriteRule: "news/2010/"
RewriteRule ^news/(.*)$ index.php?act=news&what=$1
# : "news/2010/" -> "index.php"
# %{QUERY_STRING}: "page=2" -> "act=news&what=2010/"


Most likely, the rule above works incorrectly, since the page argument is lost. Fix this:
RewriteBase /

# : http://example.com/news/2010/?page=2
# RewriteRule: "news/2010/"
RewriteRule ^news/(.*)$ index.php?act=news&what=$1 [QSA]
# : "news/2010/" -> "index.php"
# %{QUERY_STRING}: "page=2" -> "act=news&what=2010/&page=2"


We added only the [QSA] flag, and the rule began to work correctly.

It is important to understand that changing query parameters changes% {QUERY_STRING} , which can be used later in RewriteCond. This should be taken into account when drafting subsequent rules that check arguments.

- Of course, it changes, because the request goes to Apache for repeated processing!

No,% {QUERY_STRING} is changed immediately . I will not give the proof - about the parameters it’s already written more than it’s interesting to read :)

What to do to check in RewriteCond exactly those request parameters that the user submitted, and not modified by RewriteRules? See tips at the end of the article.

RewriteCond and performance


First, the matching of the request with the RewriteRule is checked, and only then the additional conditions of the RewriteCond are checked.

A few words should be said about the order in which mod_rewrite executes directives. Since in .htaccess RewriteCond goes first, and then RewriteRule, it seems that mod_rewrite first checks all the conditions, and then proceeds to run the RewriteRule.

In fact, everything happens the other way around. First, mod_rewrite checks if the current value of the request matches the regular RewriteRule expression, and only then will check all the conditions listed in RewriteCond.

So if you have a two-page regular expression in RewriteRule, and after thinking about performance, you decided to limit the execution of this rule to additional RewriteCond, you know - nothing happens. In this case, it is better to use the RewriteRule [C] or [S] flags to skip the more complex rule, if the simpler checks did not work.

Variables and flags RewriteCond, other flags RewriteRule, etc.


Read the documentation.

We got acquainted with the principles of operation of RewriteRule, RewriteBase, flags [L], [R] and [QSA], and also sorted out the query processing mechanism inside mod_rewrite. From the unaffected left: other flags RewriteRule, directives RewriteCond and RewriteMap.

Fortunately, these directives and flags do not contain any mysteries and work exactly as described in most textbooks. For their understanding, it is enough to read the official documentation. First of all, I recommend studying the list of variables that can be checked in RewriteCond -% {QUERY_STING},% {THE_REQUEST},% {REMOTE_ADDR},% {HTTP_HOST},% {HTTP: header}, etc.)

The difference in the operation of mod_rewrite in the context of .htaccess and in the context of VirtualHost


In the context of <VirtualHost>, mod_rewrite works exactly the opposite.

As I said at the beginning of the article, everything described above concerns the use of mod_rewrite in the context of .htaccess. If mod_rewrite is used in <VirtualHost>, it will work differently:

Tips and solutions


Here are collected tips that could be cited in the course of the article, but which were excluded from the main text for the sake of brevity of presentation.

Regular expression compilation


Try to make regular expressions so that they most narrowly define exactly those requests that you want to modify - so that the RewriteRule rules do not accidentally work for another request. For example:
# "^" ( )
# "$" ( ):
RewriteRule ^news.php$ index.php
# - :
RewriteRule ^news/(.*)$ index.php

# - .
# - , .
# , .
# "." ().
# http://example.com/news/2009/07/28/b-effect.html
RewriteRule ^news/20[0-9]{2}/[0-9]{2}/[0-9]{2}/[^/]+\.html index.php


However, there is a whole section on regular expressions on one well-known site.

Changing external redirects


Despite the fact that mod_rewrite allows changing even external redirects, up to the protocol, with the help of RewriteRule, I highly recommend not to do this. The article uses the example of changing external redirects only to get rid of such concepts as "links" and "files" and more clearly show that RewriteRule works with a simple line.

I do not think that the developers of mod_rewrite assumed that someone would do that, so any artifacts are possible. Do not do this, please.

How to stop an endless loop


Sometimes the logic of redirections on a site is such that without special actions mod_rewrite perceives them as an infinite loop of redirections. Take the following example.

The site was /info.html. SEO specialist decided that search engines will better index this page if it will be called /information.html and asked to make an external redirect from info.html to information.html. However, the developer of the site, for whatever reasons, cannot simply rename info.html into information.html and redirect it - it needs that the data be necessarily given directly from the info.html file. He writes the following rule:
#
RewriteRule ^info.html information.html [R,L]
# /information.html info.html
RewriteRule ^information.html info.html


... and faces an endless loop. Each request /information.html receives an external redirect again to /information.html.

This problem can be solved in at least two ways. On Habré, one of them was already described - you need to set an environment variable and, based on its value, stop redirections. The code will look like this:
RewriteCond %{ENV:REDIRECT_FINISH} !^$
RewriteRule ^ - [L]

RewriteRule ^info.html$ information.html [R,L]
RewriteRule ^information.html$ info.html [E=FINISH:1]


Notice that mod_rewrite adds 'REDIRECT_' to the variable name.

The second way is to check in THE_REQUEST what exactly was requested by the user:
# info.html.
# info.html - , .
RewriteCond %{THE_REQUEST} "^(GET|POST|HEAD) /info.html HTTP/[0-9.]+$"
RewriteRule ^info.html$ information.html [R,L]

RewriteRule ^information.html$ info.html



Analysis of the original user request - the fight against the disclosure of links Apache


When processing a request, Apache reveals encoded (URL-encoded) characters from the original request. In some cases, this may be undesirable - the developer wants to check the initial, unmodified user request. You can do this by checking the variable% {THE_REQUEST} in RewriteCond:
RewriteCond %{THE_REQUEST} ^GET[\ ]+/tag/([^/]+)/[\ ]+HTTP.*$
RewriteRule ^(.*)$ index.php?tag=%1 [L]


On Habré there is a discussion of one of these cases , and the above example was taken from it.

Recommended Documentation


Official Apache documentation and especially Technical details . Yes Yes.

Thank you very much for your attention!

Source: https://habr.com/ru/post/129560/


All Articles