For those who mess up with marketing or just hit the jet, “Black Friday” is HYIP, insane orders and crowds of customers.
It is necessary to prepare infrastructure for an influx in advance, but who thinks about such things in advance? And sometimes, the decision on participation is made the day before.
So, the holiday of consumerism has started, the servers of the online store start to blink merrily, the call center overheats, and the delivery services offer delivery somewhere in January.
What to do, relax and look at fakapy philosophically, or bravely to fight?
')

I accompanied the server infrastructure of online stores during Black Friday, and never approached me in advance and did not give me time to prepare. I share my experience with those who will receive the same order today.
(You are lucky if you can do it right, that is, set up monitoring for several months, analyze traffic, bottlenecks, project architecture, conduct stress tests, if necessary, rebuild the architecture together with the developers, pre-connect additional server capacities. We'll talk about the right approach some other time, here we will talk about fire fighting measures).
We set up monitoring
I think monitoring is primary anyway. There will still be problems, but thanks to the monitoring schedules one can understand where the bottlenecks are now.
If possible, I use Zabbix / Prometheus / ELK solutions (depending on the architecture), if not, I quickly connect the SaaS type okmeter.io. Even if the sale lasts only a day, you can not look at the monitor at a bunch of indicators like a Zulu day in a row.
Another great tool is blackfire.io/newrelic.com for profiling, pinba.org for analyzing the “slowing down” pages in general.
blackfire / newrelic will help to make out the problem on a particular page, pinba will help you see which pages are overloaded and run the longest (this is all out of the box in Beatrix, for example, but try logging in to its admin panel and working there). the site is already very bad).
Cut off the excess
I disable everything that can be turned off: conditionally unnecessary modules at the moment, all sorts of beautiful, etc.
Sale - a simple process, a big discount on a number of products. A visitor with burning eyes wants to choose a product before it has been bought, to place an order, get a discount, make a payment.
Subscribing to the newsletter, registering on the site, polling about the quality of service - the client is not interested in all this now, these modules can be disabled or simplified. Turning off everything, without which the site can easily work a couple of days.
A case from practice: during Black Friday, I spent debugging on a running server under heavy traffic, and after 2 hours it turned out that the delivery service module, which accesses external services and automatically calculates the shipping cost for each order, brakes wildly. When traffic grew hundreds of times, these external services simply stopped coping.
You can just sit down and think, and what could fall on your site / mobile app / etc?
Allow to fall
I am preparing for the fact that any service will still fall. In this case, it is necessary to show visitors at least something.
For example, an inoperative delivery service module or payment form should not block an order as a whole, the user may come the next day and complete their order.
On the pages of 50x errors, I show mail or phone number of the sales department.
error_page 500 502 503 504 /50x.html; location = /50x.html { root /srv/www/yourwebsite.com/htdocs/sale-contacts/; }
Raise a copy of the site
If it is possible to have a copy of the site for testing the changes, this is very good. I'm not talking about the well-established deployment system :)
By the way, fashionable cloud services will allow you to quickly and easily make a copy of the combat server.
A case from practice: one site, the infrastructure of which I helped maintain during Black Friday, after the developers optimized (partially on the go), added resources, optimized the software, it began to work more or less with heavy traffic, but still very much braked when placing orders. Users thought that the orders were not being sent, and just in case they pressed the checkout button several times. Several figures pressed the button 300 times! Great stress test :) Hundreds of times more visitors, and some 300 orders more! :)
CDN services
You can do without a CDN, but if the servers cannot objectively cope with the return of the required amount of static, it is imperative.
You can quickly connect CDN for popular CMS type
1C-Bitrix ,
Wordpress . But you will not adjust CDN exactly on the move, you will have to take care in advance.
AntiDDoS
I also highly recommend connecting AntiDDoS services, and be sure to advance (otherwise under a sudden load, without adaptation to normal traffic, they can start blocking legitimate visitors).
For a certain period it can be done for free:
Add server capacity
We foresee the possibility to add resources. You can add resources to the main server, create a new node for parallelizing queries, a node for mysql, etc. If not you yourself, then outsourced outsourcers will thank you so much for that.
Conveniently, if your provider has the ability to host physical and cloud servers (Selectel.ru, Servers.com).
Whew, let's go
The most dangerous is the first minutes after the mailings. The cache has not yet been warmed up, there are few statistics, you still do not know the capabilities of the system (if you have not performed serious tests in advance).
Some configs
Caching in nginx
Let's make a cache of 500 MB in size for 3 hours for all pages, except order pages.
proxy_cache_path /var/lib/nginx/cache levels=1:2 keys_zone=blackfriday_cache:180m max_size=500m inactive=7d; # blackfriday_cache, 180 proxy_cache_key "$request_method$scheme$host$request_uri"; proxy_cache_use_stale error timeout invalid_header http_500; map $uri $cookie_nocache { # , ; 1 - , 0 - "/order" "1"; "/bitrix" "1" default "0"; } location / { .... proxy_hide_header "Set-Cookie"; # proxy_ignore_headers "X-Accel-Expires"; proxy_ignore_headers "Expires"; proxy_ignore_headers "Cache-Control"; proxy_ignore_headers "Set-Cookie"; add_header X-Cache $upstream_cache_status; ... proxy_no_cache $cookie_nocache; # , map; 1 - proxy_cache blackfriday_cache; # proxy_cache_valid 180m; # 180 proxy_cache_valid 404 1m; # 404 - 1 .... proxy_pass http://backend; # } location @backend { .... # }
Additional materials a lot, links:
100 mbit channel allows you to give 12 pages weighing 1 MB per second, it is 43 thousand per hour; nginx is able to give such volume even on an inexpensive server.
Distribute requests across multiple nodes (the site must be ready to work with several web nodes)
Via Round-Robin DNS
(be careful here, this method is no longer supported correctly by many DNS providers)
$ dig lifehacker.ru +short 136.243.37.180 136.243.37.178
Via nginx upstreams
$ cat nginx.conf upstream backend { server backend1.yoursite.com; server backend2.yoursite.com; } server { server_name yoursite.com; location / { proxy_pass http://backend; } } location @backend { .... # }
Through Cloudflare, Qrator, etc.
They have the ability to set several backends right from the panel, the configuration update is usually instant.
Calmly
It happens that it is impossible to provide an ideal job, but the main thing for business is that the system works in principle. Let it slow down, but it should allow users to make orders, and not endlessly click on F5. Thousands of clients simultaneously use the “miserable, slowing down, bringing everyone into the nerves”, and they do, they do, they do orders, and each of them is valuable. I saw examples when in one day the store made a semi-annual turnover, and the result was worth all the nerves.
Successful sales to you :)