
So you made a website. It is always interesting and exciting to observe how the counter of visits slowly but surely creeps upwards, showing every day all the best results. But one day, when you do not expect this, someone will post a link to your resource on some Reddit or Hacker News (or on Habré - lane), and your server will fall.
Instead of getting new regular users, you’ll be left with a blank page. At this moment, nothing will help you to restore the server to work, and traffic will be lost forever. How to avoid such problems? In this article we will talk about
optimization and scaling .
A little about optimization
The main tips are known to all: upgrade to the latest version of PHP (OpCache is now built in 5.5), deal with indexes in the database, cache the static (rarely modified pages such as “About us”, “FAQ”, etc.).
It is also worth mentioning one particular aspect of optimization - serving static content with non-Apache server, such as, for example, Nginx, Configure Nginx to handle all static content (* .jpg, * .png, * .mp4, * .html ...), and the files requiring server processing let them send heavy Apache. This is called
reverse proxy .
')
Scaling
There are two types of scaling - vertical and horizontal.
In my understanding, the site is scalable if it can handle the traffic, without changes in the software.
Vertical scaling.
Imagine a server serving a web application. It has 4GB of RAM, i5 processor and 1TB HDD. It performs its functions perfectly, but in order to better cope with higher traffic, you decide to increase RAM to 16GB, install an i7 processor, and fork over an SSD drive. Now the server is much more powerful, and copes with high loads. This is vertical scaling.
Horizontal scaling.
Horizontal scaling - the creation of a cluster of interconnected (often not very powerful) servers that serve the site together. In this case, a
load balancer (aka
load balancer ) is used - a machine or program, the main function of which is to determine which server to send the request to. Servers in a cluster share the service of the application, without knowing anything about each other, thus significantly increasing the bandwidth and resiliency of your site.
There are two types of balancers - hardware and software. Software - installed on a regular server and receives all traffic, passing it to the appropriate handlers. Such a balancer can be, for example, Nginx. In the “Optimization” section, he intercepted all requests for static files, and served those requests himself, without burdening Apache. Another popular load balancing software is
Squid . Personally, I always use it because It provides an excellent user-friendly interface to control the deepest aspects of balancing.
The hardware balancer is a dedicated machine whose sole purpose is to distribute the load. Usually on this machine, no software other than developed by the manufacturer is no longer worth it. Read about hardware load balancers
here .
Please note that these two methods are not mutually exclusive. You can vertically scale any machine (aka
node ) on your system.
In this article, we discuss horizontal scaling, because it is cheaper and more efficient, although more difficult to implement.
Permanent connection
When scaling PHP applications, there are several difficult problems. One of them is working with user session data. After all, if you logged in on the site, and the balancer sent your next request to another machine, then the new machine will not know that you are already logged in. In this case, you can use a persistent connection. This means that the balancer remembers which node sent the user's request last time, and sends the next request to the same place. However, it turns out that the balancer is too overloaded with functions, besides processing hundreds of thousands of requests, it also has to remember exactly how it handled them, with the result that the balancer itself becomes a bottleneck in the system.
Exchange of local data.
It seems like a good idea to divide user session data between all the nodes of the cluster. And despite the fact that this approach requires some changes in the architecture of your application, it is worth it - the balancer is unloaded, and the entire cluster becomes more fault tolerant. The death of one of the servers does not affect the operation of the entire system.
As we know, session data is stored in the
$ _SESSION superglobal
array , which writes and takes data from a file on disk. If this disk is on the same server, it is obvious that other servers do not have access to it. How do we make it available on multiple machines?
First, note that
you can override the session handler in PHP . You can implement your own class to
work with sessions .
Using the database to store sessions
Using our own session handler, we can store them in the database. The database can be on a separate server (or even a cluster). Usually this method works fine, but with really big traffic, the
database becomes a bottleneck (and if the database is lost, we completely lose performance), because it has to serve all the servers, each of which tries to write or read session data.
Distributed file system
Perhaps you think that it would be nice to set up a network file system where all servers could write session data.
Do not do this! This is a very slow approach, leading to data corruption and even data loss. If, for some reason, you still decide to use this method, I recommend you
GlusterFSMemcached
You can also use memcached to store session data in RAM. However, this is not safe, because memcached data is overwritten if free space runs out. You are probably wondering, is not RAM divided into machines? How is it applied to the entire cluster?
Memcached has the ability to combine the available RAM on different machines into one pool .
The more machines you have, the more you can take to this memory pool. You do not need to pool all the memory of machines into a pool, but you can, and you can donate to the pool an arbitrary amount of memory from each machine. So, it is possible to leave most of the memory for normal use, and allocate a piece for the cache, which will allow you to cache not only sessions, but other relevant information.
Memcached is an excellent and widespread solution .
To use this approach, you need to edit php.ini a little
session.save_handler = memcache session.save_path = "tcp://path.to.memcached.server:port"
Redis cluster
Redis - NoSQL data storage. Stores the base in RAM. In contrast, memcached supports persistent data storage, and more complex data types.
Redis does not support clustering , so using it for horizontal scaling is somewhat difficult, however, this is temporary, and the alpha version of the
cluster solution is already out.
Other solutions
ZSCM is a good alternative from Zend, but requires a Zend Server on each node.
If you are interested in other NoSQL repositories and caching systems - try
Scache ,
Cassandra or
Couchbase .
Total
As you can see, scaling out PHP applications is not so easy. There are many difficulties, most of the solutions are not interchangeable, so you have to choose one and stick to it until the end, because when traffic goes off scale, there is no longer a chance to go smoothly to something else.
I hope this small guide will help you choose the approach to scaling for your project.
In the second part of the article we will talk about
database scaling .