📜 ⬆️ ⬇️

Caching proxy server on nginx. Tricky configuration

On Habré already have several descriptions of Nginx, but I think my configuration will also be interesting.
The situation is as follows: there is a website hosted on several IIS servers (online store), with a balancer in front of it. Between them, it was decided to install nginx to reduce the load on IIS.

The bulk of dynamic content is displayed by Ajax, so caching product pages is quite safe. However, they may have reviews about the product, for which you can vote - just like on Habré, which should also be taken into account.

Plus, I want to maintain the validity of popular pages in the cache automatically.

So, first install fresh nginx - without it will not work. We also need wget and curl.
I will not describe in detail the configuration of the proxy itself, but will focus in detail on the methods of keeping the cache up to date.
')
Nginx configuration itself:


worker_processes 4; events { worker_connections 1024; } http { include mime.types; default_type application/octet-stream; #    . proxy_cache_path /var/cache/nginx levels=2:2 keys_zone=STATIC:512m inactive=24h max_size=32g; sendfile on; keepalive_timeout 65; gzip on; gzip_proxied any; gzip_min_length 1100; gzip_http_version 1.0; gzip_buffers 4 8k; gzip_comp_level 9; gzip_types text/plain text/css application/x-javascript text/xml application/xml application/xml+rss text/javascript application/json; server { server_name 127.0.0.1; listen 80; set $backend 127.0.0.2; #   .   .   - | log_format cache '$remote_addr|$time_local|$request_uri|$request_method|$status|$http_referer|$cookie___sortOrder|$IsAuth|$sent_http_content_type|$http_user_agent'; access_log /var/log/nginx/proxy_access.log cache; error_log off; #       location ~* /(basket.aspx|visitedgoods.aspx|users/|sale/order.aspx|sale/posted.aspx) { proxy_pass http://$backend; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Host $http_host; proxy_pass_header Set-Cookie; proxy_redirect off; set $IsAuth 1; if ($cookie_AUTH = "") { set $IsAuth 0; } } location / { #     | -     ,   -   ,     if ($args ~* (.*)\|(.*)) { set $brand $1$2; rewrite ^(.*) $1?$brand? redirect; } if ($args ~* (.*)\%7C(.*)) { set $brand $1$2; rewrite ^(.*) $1?$brand? redirect; } rewrite ([a-zA-Z0-9]+)\|([a-zA-Z0-9]+) $1$2? permanent; rewrite ([a-zA-Z0-9]+)\%7C([a-zA-Z0-9]+) $1$2? permanent; rewrite (.*)\%7C$ $1? permanent; rewrite (.*)\| $1? permanent; proxy_pass http://$backend; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Host $http_host; proxy_pass_header Set-Cookie; proxy_ignore_headers "Expires" "Cache-Control" "Set-Cookie"; # ,   .      ,      . set $IsAuth 1; if ($cookie_AUTH = "") { set $IsAuth 0; } # ,      -       set $DoBypass 0; if ($http_user_agent = "WGET-POST-daemon") { set $DoBypass 1; } #  __sortOrder          proxy_cache STATIC; proxy_cache_key "$host$uri$is_args$args $cookie___sortOrder $IsAuth"; proxy_cache_valid 200 301 302 304 30m; proxy_cache_bypass $DoBypass; proxy_cache_use_stale error timeout invalid_header updating; proxy_connect_timeout 100ms; proxy_redirect off; } } } 


So, the proxy is ready. Now the fun part.

We want to have in the cache TOP-20000 pages for the last day. The log looks like this:

127.0.0.3|20/Oct/2011:15:45:43 +0400|/catalog/25/women.aspx|GET|200|http://127.0.0.1/|-|0|text/html; charset=windows-1251|Opera/9.80 (Windows NT 6.1; U; ru) Presto/2.9.168 Version/11.51


Logs are turned every hour and stored day. Unfortunately, logrotate cannot convert logs less than once a day, so the average hack is applied: size 1 to the configuration file and logrotate -f /etc/nginx.rotate by kr once a hour.

Script to create a list of the most visited pages:

 #!/bin/bash #   IP,        ourIP=$(ifconfig | grep 'inet addr:'| grep -v '127.0.0.1' | cut -d: -f2 | awk '{ print $1}') #       ,    curl -H "Cookie: BasketID=KYKY-B-PYKY" -H "Content-Type: application/json" -d "{\"login\":\"USERNAME\",\"password\":\"PASS\",\"remeberMe\":\"true\"}" -c /var/log/nginx/cookies1.txt http://127.0.0.1/Resources/Services/SystemService.asmx/SignIn line=$(tail -n 1 /var/log/nginx/cookies1.txt) AUTH=$(echo $line | awk '{wild=$7; print wild}') #   :      . cp /var/log/nginx/proxy_access.log /var/log/nginx/overall_proxy FILES=/var/log/nginx/*.gz for f in $FILES do cat $f | gunzip>> /var/log/nginx/overall_proxy done #     wget- rm -f /var/log/nginx/wget/* #         (   - text/html),          ,  20000       wget-. awk -v ourIP="$ourIP" '{ FS="|"; ip = $1; url = $3; code = $5; catsrt=$7; isauth=$8; ct=$9; if (ip != ourIP) if (url !~ "basket.aspx" && url !~ "visitedgoods.aspx" && url !~ "users/" && url !~ "sale/order.aspx" && url !~ "sale/posted.aspx") if (code = "200") if (ct ~ "text/html;") print "http://" ourIP url "|" catsrt "|" isauth}' /var/log/nginx/overall_proxy | sort | uniq -c | sort -n -k1,6 | tail -n20000 | awk '{print $2}' | awk -v AUTH="$AUTH" '{FS="|"}$2=="-"{$2=""} $3=="0"{$3=""} $3=="1"{$3=AUTH}{print "-b --header=\"Cookie: __sortOrder="$2"; AUTH="$3"\" -o /dev/null -O /dev/null "$1 }'> /var/log/nginx/cache.dat rm -f /var/log/nginx/overall_proxy cd /var/log/nginx/wget #     10  -   10  . split -l 2000 /var/log/nginx/cache.dat rm -f /var/log/nginx/cache.dat 


So, at the output we get files with the commands:
-b --header="Cookie: __sortOrder=; AUTH=" -o /dev/null -O /dev/null 127.0.0.1/catalog/25/women.aspx

Then every 20 minutes we have to go through this list and request each of the pages from the server to validate the cache.

 #!/bin/bash FILES=/var/log/nginx/wget/* #      wget- if [ -s /var/log/nginx/wgets.pid ] then cat /var/log/nginx/wgets.pid | xargs kill rm -f /var/log/nginx/wgets.pid fi #   . for f in $FILES do cat $f | xargs wget & echo $! >> /var/log/nginx/wgets.pid done 


Now it remains only to update the cache pages when voting for a comment. This deals with the demon, constantly monitors the logs for a vote.

 #!/bin/bash #    curl -H "Cookie: BasketID=KYKY-B-PYKY-U-ATAC" -H "Content-Type: application/json" -d "{\"login\":\"USERNAME\",\"password\":\"PASS\",\"remeberMe\":\"true\"}" -c /var/log/nginx/cookies.txt http://127.0.0.1/Resources/Services/SystemService.asmx/SignIn line=$(tail -n 1 /var/log/nginx/cookies.txt) AUTH=$(echo $line | awk '{wild=$7; print wild}') # ,      if [ -f /var/log/nginx/post-daemon.pid ] ; then echo "POST-daemon already running!" exit fi #     nginx-    (/usr/bin/tail -f /var/log/nginx/proxy_access.log & echo $! >/var/log/nginx/post-daemon.pid) | while read -r line do if [[ $line =~ '/Resources/Services/SystemService.asmx/VoteToComment|POST|200' ]]; then #    wget -   . ref=$(echo $line | awk -F"|" '{ FS="|"; ref=$6; print ref}') sortOrder=$(echo $line | awk -F"|" '{ FS="|"; co=$7; print co}') IsAuth=$(echo $line | awk -F"|" '{ FS="|"; IsAuth=$8; print IsAuth}') if [[ $sortOrder == "-" ]]; then sortOrder="" fi # ,    -     ,  . UserAgent   nginx-,         . if [[ $IsAuth == "0" ]]; then wget --user-agent="WGET-POST-daemon" --header="Cookie: __sortOrder=$sortOrder" -o /dev/null -O /dev/null $ref else wget --user-agent="WGET-POST-daemon" --header="Cookie: __sortOrder=$sortOrder; AUTH=$AUTH" -o /dev/null -O /dev/null $ref fi fi done exit 


Configuration for logrotate:

 /var/log/nginx/*log { daily rotate 24 size 1 missingok notifempty compress postrotate /etc/init.d/nginx reload /etc/init.d/nginx-POSTcache restart endscript } 


init script for our daemon:

 #!/bin/sh # # This script starts and stops the nginx cache updater daemon # # chkconfig: - 85 15 # # processname: post-daemon # pidfile: /var/log/nginx/post-daemon.pid . /etc/rc.d/init.d/functions daemon="/usr/local/sbin/post-daemon.sh" pidfile="/var/log/nginx/post-daemon.pid" prog=$(basename $daemon) start() { [ -x $daemon ] || exit 5 echo -n $"Starting POST-daemon: " ($daemon &) & retval=$? echo [ $retval -eq 0 ] return $retval } stop() { echo -n $"Stopping POST-daemon: " pid=$(cat $pidfile) kill $pid rm -f $pidfile retval=$? echo [ $retval -eq 0 ] && rm -f $lockfile return $retval } restart() { stop start } rh_status() { status $prog } case "$1" in start) start ;; stop) stop ;; restart) restart ;; status) rh_status ;; *) echo $"Usage: $0 {start|stop|status|restart}" exit 2 esac 


The solution is slightly crusty, but working. The load on IIS fell, the speed of return of pages to the client increased. I note that all the pictures of the site are on a separate server and can not be cached.

Source: https://habr.com/ru/post/131146/


All Articles