How to make a CDN for your site and why it is useful for high-load projects

The main task of the department of operation of Sports.ru and Tribuna.com is scaling the network infrastructure in the context of constant growth of traffic (for 1.5 years the traffic and the number of requests per second doubled), regular peak loads and the audience distributed across different countries . To solve this problem, we use different technologies; One of them is creating your own CDN (Content Delivery Network), which allows you to reduce the workload, increase protection against DDoS-a, and speeds up site loading in remote regions. We decided to share our experience in this area and made a short practical guide for system administrators on deploying and operating their CDN.

I. Theory.

1. Terminology.

Content delivery network (and distribution) (eng. Content Delivery Network or Content Distribution Network, CDN) is a geographically distributed network infrastructure that allows optimization of content delivery and distribution to end users on the Internet. Using content providers CDN increases the speed of Internet users download audio, video, software, gaming and other types of digital content in the points of presence of the CDN.

So tells us the all-knowing wikipedia. But how does this CDN work? Let's immediately determine the declination. A CDN is a bla-bla-bla network. The network is feminine, so we are inclined to abbreviate as feminine.
')
From the point of view of how this works, you can write the following formula:

CDN = anycast + proxy .

You can read about anycast here ru.wikipedia.org/wiki/Anycast , if someone needs to remember what a Proxy-server is, you can fill in the knowledge gaps here: ru.wikipedia.org/wiki/%D0%9F%D180 % D0% BE% D0% BA% D1% 81% D0% B8-% D1% 81% D0% B5% D1% 80% D0% B2% D0% B5% D1% 80

In fact, this technology is reduced to the announcement of the network address in which the site placed in the CDN is resolved from these “geographically distributed sites” and the proxying of requests to a conditional single server.

What profit from this can be obtained?
Placing data closer to the client, you get a smaller delay in the response for the user, reduce the load on your server, protection from DDoS ... blah blah blah - read the description of any commercial CDN network.

Seems difficult and incomprehensible? In fact, it is easier than it seems, below everything will be described in more detail.

2. Why your CDN, and not consulting.

The question arises, why do you do your CDN? Lots of great CDN networks use.
First, it is not our method;)
Secondly, these networks have already been built, and it’s not a fact that they are suitable for your distribution for one hundred percent. In the case of our CDN, we are free to place its nodes anywhere.
Thirdly, we invest in our infrastructure, and not in someone else’s.
Fourth, we can customize our CDN in any way. You can cache not only static data, but also dynamic data, for example, data for anonymous names or general data. No commercial network in full will give us such flexibility.

3. Profit from use.

So, what exactly can you get from your CDN network:

Download speed / traffic reduction.
Placing data closer to the user, you can expect that he will receive this data faster. Is logical. What with the reduction in traffic? Caching data on the nodes reduces the number of calls to the “main server”. Plus, we can always transfer compressed data from the main server to the CDN node, in keepalive connections. You can configure the CDN node so that the competing requests that we cache are not executed in parallel. This also saves traffic and CPU time on the “main server”. In the case of sports.ru, this all adds up to reducing the number of requests and traffic by 3.5 times to the “main server”.
The CDN node is also a great place to host a DNS slave server. Actually, for the same reasons.

Ii. Practice.

4. Necessary conditions.

In order to make your CDN server, we need the server itself, its own AS ru.wikipedia.org/wiki/%D0%90%D0%B2%D1%82%D0%BE%D0%BD%D0%BE%D0% BC% D0% BD% D0% B0% D1% 8F_% D1% 81% D0% B8% D1% 81% D1% 82% D0% B5% D0% BC% D0% B0_ (% D0% B8% D0% BD % D1% 82% D0% B5% D1% 80% D0% BD% D0% B5% D1% 82) and a free ip-routenum in which we will place our CDN. There are almost no free IPv4 networks left, but no one bothers to use this article for IPv6 history :)

It will still be necessary to somehow proxy and save the responses of incoming requests.
Fault tolerance will be solved by the fact that Proxy and Anycast will be collected on the same server. If for some reason it turns off, it will not affect the entire CDN as a whole. Naturally, there should be as many such servers as possible;)

5. How To.

To implement all this, you will need a server with several physical cores to distribute the load of the network card on each core. Enough RAM to put the entire cache into ramdisk. We do not want to load disks on the server far away? ;) A network card with support for distributing the load to different processor cores in order to have greater packet performance. And RAID1 on SATA disks for greater reliability.

And this business must be properly configured to make it work in full force. Tuning for large web loads and squeezing all the juices from iron is well suited FreeBSD 9.x. Linux can also be used, but in terms of transparency and ease of configuration, the single type of configs in my personal rating is won by FreeBSD .
In order not to repeat, I will give examples of practical recommendations for configuring FreeBSD for a similar type of load:

dadv.livejournal.com/139170.html
serverfault.com/questions/64356/freebsd-performance-tuning-sysctls-loader-conf-kernel

or google on the appropriate topic.

I propose to place the advertised network or a part of this network on the local loop interface:

add lines to /etc/rc.conf:

ifconfig_lo0_alias0 = "inet <ip-address of CDN1> / 32"
ifconfig_lo0_alias1 = "inet <ip-address CDN2> / 32"

and, of course, remember to enable routing:

add lines to /etc/sysctl.conf:

net.inet.ip.forwarding = 1

Turn on the ramdisk for the data cache, we will use tmpfs for this:

add lines to /boot/loader.conf:

tmpfs_load = "YES"

and / etc / fstab:

tmpfs / mnt / tmpfs tmpfs rw, mode = 777 0 0

We will announce the CDN network through OpenBGP :
www.freshports.org/net/openbgpd

It has all the necessary functionality and is simply customized. The scant information about him on the Internet is leveled by the detailed Man. It can also integrate with the PF packet filter if we want to use it. Here is a simple config that will do everything that is needed:

/usr/local/etc/bgpd.conf:

AS <our AS number>
router-id <router-id>
network <advertised network>
group "Uplink" {
neighbor <provider address> {
remote-as <AS provider>
descr "uplink"
announce self
}
}
deny from any
deny to any
allow from <provider address>
allow to <provider address> prefix <advertised network>

Let's proceed to the Nginx configuration. In fact, we need to configure a proxy server with static aggregation and compression of other requests and cache data on Nginx.

Do not forget when assembling Nginx to specify the assembly with the gunzip module, for unclamping data for clients that do not support compression.

nginx.conf:

worker_processes auto;
http {
include mime.types;
proxy_temp_path / mnt / tmpfs / tmp;
proxy_cache_path / mnt / tmpfs / cache / site_cache levels = 1: 2 keys_zone = site: 128m max_size = <cache size> inactive = <time, which stores data in the cache>;
gzip on;
gzip_disable "msie6";
gzip_comp_level 4;
gzip_types text / plain text / html application / xml application / x-javascript text / javascript text / css application / json text / xml application / rss + xml;
gunzip on;
server {
listen 80 default;
server_name localhost;
location / {
proxy_cache_use_stale updating timeout http_500 http_502 http_504;
proxy_cache site;
proxy_cache_key $ uri $ is_args $ args;
proxy_pass frontend;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $ host;
}
location ~ * \. (3gp | 7z | avi | bmp | css | doc | docx | gif | gz | jpg | jpeg | js | mov | mp3 | mp4 | ogg | png | ppt | pptx | rar | tar | tiff | torrent | ttf | svg | swf | wma | xls | xlsx | xsl | xslt | zip) $ {
proxy_cache_use_stale timeout updating http_500 http_502 http_504;
proxy_cache site;
proxy_cache_key $ uri $ is_args $ args;
proxy_cache_valid <the time we cache the static>;
proxy_pass frontend;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $ host;
}
}
upstream frontend {
server <frontend address>;
}

If for some reason you do not need to cache dynamic requests on the site, turn off caching in the location / section. You can remove the static data by changing the GET parameter in them, for example, by specifying the revision number.

Bind
Everything is simple, install and add lines to named.conf:

zone "site.ru" {
type slave;
file "/etc/namedb/slave/site.ru";
masters {
<ip-address of the dns-master server>;
};
};

Do not forget to allow downloading the entire DNS zone from the master server.

That's all! CDN server is configured. There will be a bit of rework if you want a CDN thread a few more sites;)

Source: https://habr.com/ru/post/198598/

All Articles