📜 ⬆️ ⬇️

That the site did not fall: an economical method



Sites are falling. I have been working in hosting for 7 years and for the last 5 years (among other things) I provide services in geographically-distributed clusters so that in case of an accident in one of the data centers, the site continues to work in another. At the exit, such a solution costs at least 4 thousand rubles per month for 1 virtual server. It may be expensive for a small online store for “insurance”, which will be required 1-3 times a year, and if lucky, it will not be required at all. Accordingly, many need a cheaper option, suitable for small and medium businesses. Now I will tell you how to solve this very, very simply.

Important entry


Last time I talked about the principles of organizing a budget fault-tolerant cluster. A full cluster requires at least doubling the amount of equipment in order to be able to continue to work if it fails. In this post I will talk about how to do without duplicating the system and at the same time keep the clients in case of failure of the main server.

So, when the site of an online store or, for example, a cafe falls, there are two problems:
  1. Customers can not reach the site and you lose orders. As a rule, falls rarely last more than a day, so usually just subtract the day of work from the profits.
  2. Worse, sites may fall out of search engines, because search robots see some 503 or 404 error instead of a website, or they don’t see anything at all.

')
If the first is usually easy enough to endure in a small business (well, 9-15 thousand turnover with a margin of 30% is 3-5 thousand rubles, in general it is not terrible), then the subsidence in search engines is much more expensive. Face seoshnika on the second hour of the fall will look scary. Damage (if not lucky) - approximately your monthly budget for promotion. For this, it is worthwhile to “lay straws” in case of a fall.

We talked to Milfgard about fault-tolerant solutions, and he told how they had been solving this problem in Mosigra for many years. At the time of site outages, clients simply switch to a static html stub with basic information.

This is a page where you can see general information about the company, a telephone for communication and maybe a few more pages in which money is invested. For such a stub enough from 1 to 20 pages (conventionally this is the main page and a few pages with top products). If more, it already makes sense to think about the cluster. Such a “stub” system is made and maintained noticeably simpler than a full-fledged cluster.

The essence of the method


  1. The main pages are taken from the site (home + some of your choice). A static stub site is automatically created from them. When you click on any link (leading to pages not included in such a stub), a message appears like: "The site is temporarily unavailable, call 123, we will accept the order." This stub is hosted on a server independent of the hosting where the main site is running.
  2. To maintain relevance (prices, design changes, etc.), such a stub site is automatically updated once a week.
  3. The domain is delegated to a reliable DNS service (in my case, Yandex, because it is in itself fault tolerant), which can be managed through the API.
  4. The backup server monitors the main server and, in case of failure, changes the IP address to the address of the backup server. The check is performed once a minute, and if the robot encounters an error 3 times in a row, the A-domain record is switched to the backup server. When restoring the main site, the record changes back.
  5. When the record is switched back and forth, an SMS notification is sent to the owner or administrator.


In other words : we take and copy the main pages of any site, make a static stub and make sure that when the main site falls, customers switch to our static version. Then we switch back after the restoration of the main site.
The switching process takes about 1.5 minutes, that is, this time (TTL) plus or minus a couple of minutes the site still lies down. When we first started testing, the delay was about 12-17 minutes, now everything is much faster: there were options.

Cluster Benefits


  1. Very simple to implement, done one by one.
  2. Incomparably cheaper.
  3. Often, this is enough to save the buyer who came from advertising - the operator will tell the details on the phone, well, in general, the client will see that the site is live, works instead of an incomprehensible mistake.
  4. Does not require any support from the main site, works with any engines and technologies.
  5. Saves from overloads, errors in the site / database software, hacks, attacks, etc. - for any unfavorable scenario, you can switch the domain to a static site and it honestly will work. It is difficult to break the site from some static html files with pictures and it will sustain noticeably more than the main site with a bunch of functions and a large base.
  6. It can work with any hosting, as long as it supports work with external DNS servers (the vast majority support it).
  7. It’s not at all necessary for the client to transfer their site to our hosting - it’s enough to order a service for such a stub and you don’t have to touch anything on your site.


Once again : such a stub does not require any changes from the site, hosting, etc. - the site will work as it worked, and in case of problems a straw will be prepared where it should fall gently.

disadvantages


  1. This is a stub - the user will not be able to search the site, register, log in to his personal account, etc. Just see the basic information and (conditionally price and phone number to contact the operator).
  2. When switching there is a delay in updating the DNS cache, about 1.5 minutes.
  3. There is an additional difficulty - you need the site owner to delegate the domain to the Yandex DNS servers. This is not difficult, but it takes experience - this procedure will not be able to perform an ordinary secretary.


Practical implementation


  1. The client's domain is delegated to Yandex DNS servers — they are free and reliable, there is a simple management API. TTL is set to the minimum - 90 seconds.
  2. A separate server hosts the monitoring service and hosting static sites. Monitoring once a minute refers to the main server, downloads the main page and searches there for a key phrase that says that the site is working. This is usually the code of Yandex-metrics or google-analytics, but you can also insert something special that will be exactly issued by a query from the database inside the main text of the page. In case of three failures in a row, an SMS message is sent to the client about the failure and switching the site to a spare site.
  3. At the same time, monitoring changes the IP address of the domain to the address of the backup site and continues to check the main site for availability. As soon as the main site returns to normal, a notification is sent to the client that the site has switched to the main site and the main IP address of the server is returned.
  4. If necessary, the client can organize ftp-access to the files of his site or API to download the archive to update the stub in automatic mode (this is not yet implemented on my stream).


Look


See how it works on the example of test sites:

splasher-test-00.inf1f2.ru - gives an error from zero to the 14th minute of each hour
splasher-test-15.inf1f2.ru - gives an error from the 15th to the 29th minute of each hour
splasher-test-30.inf1f2.ru - gives an error from the 30th to the 44th minute of each hour
splasher-test-45.inf1f2.ru - gives an error from the 45th to the 59th minute of each hour

Source: https://habr.com/ru/post/260277/


All Articles