📜 ⬆️ ⬇️

Accelerating the distribution of pictures


The problem of slow return of static content every sysadmin sooner or later faces.

This appears approximately as follows: sometimes a 3Kb picture is loaded as if it weighs 3Mb, out of the blue they start to “stick” (give up very slowly) css and javascript. You press ctrl + reload - and, it seems, there is no problem, then after only a few minutes everything repeats again.

The true cause of the “brakes” is not always obvious, and we look askance at either nginx, or at the hoster, or at the “clogged” channel, or at the “brake” or “buggy” browser :)
')
In fact, the problem is the imperfection of the modern hard drive, which has not yet parted with the mechanical subsystems of spindle rotation and head positioning.

In this article I will offer you my solution to this problem, based on practical experience of using SSD drives in conjunction with the nginx web server.


How to understand that hard brakes?


In Linux, problems with the speed of the disk system are directly related to the iowait parameter (the percentage of CPU idle while waiting for I / O operations). In order to monitor this parameter there are several commands: mpstat , iostat , sar . I usually run iostat 5 (measurements will be taken every 5 sec.)
I am calm for the server, whose average iowait is up to 0.5% . Most likely on your server "distribution" this parameter will be higher. It makes sense not to postpone optimization, if iowait> 10% Your system spends a lot of time moving the heads through the hard drive instead of reading information, this can lead to "braking" and other processes on the server.

How to deal with big iowait?


Obviously, if you reduce the number of disk I / O operations, the hard drive is lighter and iowait will fall.
Here are some recommendations:

This will help a bit and will give time to last until the upgrade. If the project grows, iowait will remind you soon. :)

Upgrade iron



Absolutely will not affect the speed of distribution of the upgrade CPU, because it does not slow down! :)

Why SSD


A year and a half ago, when I wrote the article “Tuning nginx” , one of the nginx acceleration options I suggested was the use of SSD hard drives. Habrasoobshchestvu restrained showed interest in this technology , there was information about the possible inhibition of SSD over time and fear for a small number of rewriting cycles.
Very soon after the publication of the article in our company appeared Kingston SNE125-S2 / 64GB based on Intel x25e SSD, which is still used on one of the most heavily distributed servers.

After a year of experiments, a number of flaws emerged, which I would like to tell you:
Why I use SSD:

Configuring SSD cache


File system selection
At the beginning of the experiment, ext4 was installed on the Kingston SNE125-S2 / 64GB. On the Internet, you will find many recommendations on how to “chop off” logging, the last file access dates, etc. Everything worked perfectly and for a long time. The most important thing that didn’t suit was that with a large number of small photographs 1-5K on 64G SSD less than half was placed - ~ 20G. I began to suspect that my SSD is not being used rationally.

Upgraded the kernel to 2.6.35 and decided to try (still experimental) btrfs, there is an opportunity to specify when mounting that ssd is mounted. The disk can not be divided into sections, as is customary, but format as a whole.

Example:
mkfs.btrfs /dev/sdb 

When mounting, you can disable many features that we do not need and enable compression of files and metadata. (In fact, jpeg-and will not be compressed, btrfs smart, only metadata will be compressed). Here is what my fstab mount line looks like (all in one line):

UUID = 7db90cb2-8a57-42e3-86bc-013cc0bcb30e / var / www / ssd btrfs device = / dev / sdb, device = / dev / sdc, device = / dev / sdd, noatime, ssd, nobarrier, compress, nodatacow, nodatasow , noacl, notreelog 1 2

You can get the formatted disk UUID using the command:
 blkid /dev/sdb 


As a result, the disc “got into” more than 41G (2 times more than on ext4). At the same time, the speed of distribution did not suffer (since iowait did not increase).

We collect RAID from SSD
The moment came when 64G SSD was not enough, I wanted to collect several SSDs into one large section and at the same time there was a desire to use not only expensive SLCs, but also ordinary MLC SSDs. Here you need to insert a bit of theory:

Btrfs saves 3 types of data on a disk: data about the file system itself, addresses of metadata blocks (there are always 2 copies of metadata on the disk) and, in fact, the data itself (file contents). Experimentally, I found that in our directory structure “compressed” metadata occupies ~ 30% of all data in the section. Metadata is the most intensely variable block, since any addition of a file, transfer of a file, change of access rights entails overwriting a block of metadata. The area where the data is stored is simply overwritten less often. Here we come to the most interesting possibility of btrfs: it is to create software RAID-masyvy and explicitly indicate on which drives to save data on which metadata.

Example:
 mkfs.btrfs -m single /dev/sdc -d raid0 /dev/sdb /dev/sdd 

as a result, the metadata will be created on / dev / sdc and the data on / dev / sdb and / dev / sdd, which will be collected in the stripped raid. Moreover, you can connect more disks to the existing system , perform data balancing, etc.

To find out the UUID btrfs RAID-run:
 btrfs device scan 

Attention: feature of working with btrfs-rayd: before each mount the RAID array and after loading the btrfs module it is necessary to run the command: btrfs device scan . To automatically mount via fstab, you can do without 'btrfs device scan' by adding the device options to the mount line. Example:
 /dev/sdb /mnt btrfs device=/dev/sdb,device=/dev/sdc,device=/dev/sdd,device=/dev/sde 


Caching on nginx without proxy_cache


I assume that you have a storage-server on which all the content is located, there is a lot of space on it and the usual "floppy" SATA hard drives that are not able to hold a large share of access.
Between the storage server and site users, there is a “distribution” server, the task of which is to take the load off the storage server and ensure uninterrupted distribution of statics to any number of clients.

Install one or more SSDs with btrfs on board to the distribution server. This is where the proxy_cache-based nginx configuration comes to mind. But she has a few drawbacks for our system:

We will take another approach to caching. The idea flashed on one of the conferences on hiload. Create 2 cache0 and cache1 directories in the cache section. When proxying, all files are saved in cache0 (using proxy_store). nginx make the file check (and give the file to the client) first in cache0 and then in cache1 and if the file is not found, go to the storage server behind the file, then save it to cache0.
After some time (week / month / quarter), delete cache1, rename cache0 to cache1, and create an empty cache0. We analyze the logs of access to the cache1 section and those files that are requested from this section are interlinked into cache0.

This method allows to significantly reduce write operations on SSD, since file relinking is still less than full file overwriting. In addition, you can collect a raid of several SSDs, 1 of which will be SLC for metadata and MLC SSD for regular data. (On our system, metadata takes up about 30% of the total data) . When relinking, only metadata will be overwritten!

Nginx configuration example
 log_format cache0 '$request'; # ... server { expires max; location / { root /var/www/ssd/cache0/ ; try_files $uri @cache1; access_log off; } location @cache1 { root /var/www/ssd/cache1; try_files $uri @storage; access_log /var/www/log_nginx/img_access.log cache0; } location @storage { proxy_pass http://10.1.1.1:8080/$request_uri; proxy_store on; proxy_store_access user:rw group:rw all:r; proxy_temp_path /var/www/img_temp/; #    SSD! root /var/www/ssd/cache0/; access_log off; } # ... 


Scripts for cache0 and cache1 rotation
I wrote several scripts on bash to help you implement the previously described rotation scheme. If the size of your cache is measured in hundreds of gigabytes and the amount of content in the cache is in millions, then it is advisable to run the ria_ssd_cache_mover.sh script several times in a row after the rotation with the following command:
 for i in `seq 1 10`; do ria_ssd_cache_mover.sh; done; 
Time for which this command will be executed install experimentally. She worked for me almost a day. On the next. day set launch ria_ssd_cache_mover.sh on cron every hour.

DOS protection and storage server
If the storage server is hilovat and there are ill-wishers thirsting for your system, you can use the secure_link module together with the described solution .

useful links




UPD1: Still, I advise you to use the kernel> = 2.6.37 and later , because I recently had a large crash cache at 2.6.35 due to an overflow of space on the SSD with metadata. As a result, the incoming formatted several SSDs and reassembled btrfs-raid. :(

Source: https://habr.com/ru/post/108958/


All Articles