Server on steroids: FreeBSD, nginx, MySQL, PostgreSQL, PHP, and more

I like this picture, with me, I never get it

Introduction

A lot of time has passed since I wrote the previous article on optimizing this bundle. That long-suffering Pentium 4 with 512MB of memory, servicing simultaneously up to a thousand people on the forum and up to 150,000 peers on the tracker, has long been resting on a German thread, a dump, and the club has been replaced by more than one server. Everything said in it still remains relevant, but there are things that should be added.
The article is large, so it will be divided into logical blocks:

  0. Why is there anything to optimize?
  
 1. OS Optimization (FreeBSD)
   1.1 Transition to 7.x 
   1.2 Transition to 7.2
   1.3 Switch to amd64
   1.4 Unloading the network subsystem
   1.5 FreeBSD and a large number of files
   1.6 Softupdates, gjournal and mount options
  
 2. Optimization of frontend (nginx)
   2.1 Accept Filters
   2.2 Caching
   2.3 AIO
  
 3. Backend optimization
   3.1 APC
   3.1.1 APC locking
   3.1.2 APC hints
   3.1.3 APC fragmentation
   3.2 PHP 5.3
  
 4. Database Optimization
   4.1 MySQL 
   4.1.1 Transition to 5.1
   4.1.2 Transition to InnoDB
   4.1.3 MySQL Internal Cache - Query Cache
   4.1.4 Indices
  
 4.2 PostgreSQL
   4.2.1 Indices
   4.2.2 pgBouncer and others.
   4.2.3 pgFouine
  
 4.3 Database Unloading
   4.3.1 SphinxQL
   4.3.2 Non-RDBMS Storage
   4.4 Encodings
   4.5 Asynchrony
  
 Application.  Little things.
   1. SSHGuard or alternative.
   2. xtrabackup
   3. Transfer mail to another host
   4. Integration with third-party software
   5. Monitoring
  
  6. Cons optimization

0. Why is there anything to optimize?

In general, you can grow:

Scale up (build up iron)
Scale out (Increase the number of frontends / machines in the middle tier)
Optimizing

The first option is better to use when you have a lot of money, the second, when a good architecture. Well, the third, which I will describe, is used when there is neither the first nor the second, but I want to squeeze the most out of the available iron.

1. OS Optimization (FreeBSD)

1.1 Transition to 7.x

What do we get when upgrading to a new version of FreeBSD?
For me the most important thing is:
The new ULE 3.0 Scheduler and jemalloc are quite useful on multi-core (> = 4) systems.
MSI (Message Signaled Interrupts) - they are often referred to in drivers as Fast Interupts.
')
So if you have a legacy 6.x system that begins to sag under load, it may be worth translating it to 7.x.

1.2 Transition to 7.2

Superpages , increased KVA, optimized by default sysctl'i. All this you get is absolutely free just by going to the latest release of the OS.

Progress also does not stand still, and now FreeBSD 8.0 is preparing for the release, there we are promised further increase in performance. As proof of stability, www.FreeBSD.org was translated to FreeBSD-CURRENT at the time of the first beta versions. So on the staging machines you can start driving it already.

1.3 Switch to amd64

Moving to amd64, you will additionally receive the giant sizes of KVA and Shared Mem> 2Gb . However, this is not the most important thing ...

Note that 4 Gb of memory in 2009 is already fully installed on laptops, and it is quite ridiculous to put so much on the server from the database. Of course, for a small database this is normal, but what to do when it grows and stops getting into memory? C i386 OS to deliver more memory will be problematic, because PAE is a separate glitch. Yes, and INT64 has been used a lot for a long time already and it gives productivity gains to such applications as, for example, databases and OpenSSL. (If anyone has links to adequate benchmarks "*** SQL i686 vs amd64" - throw in the comments).

1.4 Unloading the network subsystem

Here in FreeBSD is not just a field, but a whole testing ground.
The whole optimization can be divided into 2 parts: Tuning ifconfig parameters and sysctl.conf/loader.conf , let's go in that order.
First you need to look at what our networkers are generally capable of, for this you can use this command:

  # ifconfig -m
 capabilities = 399b <RXCSUM, TXCSUM, VLAN_MTU, VLAN_HWTAGGING, VLAN_HWCSUM, TSO4, WOL_UCAST, WOL_MCAST, WOL_MAGIC>

If you have a good em (Intel Gigabit) / bge class (Broadcom Gigabit) network card, you can try the ifconfig options:

tso ( tso4 , tso6 ) - TCP Segent Offloading
lro - Large Recive Offload
txcsum , rxcsum - RX / TX Checksum Offload
link0 , link1 , link2 - depend on the driver, it is necessary to watch its code. Sometimes they include some optimizations, sometimes it just switches the network card to MASTER on Gigabit links.

Also, on em network card and multi-core processes, you can try the drivers from Yandex , which process packages in several streams. Just a lot of things about tuning the network subsystem can be found at nag.ru

If you have a third- re/rl/sk/nfe ( re/rl/sk/nfe ...), then the above options may not work correctly and cause the server to hang, so it’s better to stop at polling 'e.

And finally, I recommend everyone to see the updated version of tuning FreeBSD 7 “according to Sysoev” and my list of sysctl with comments.

1.5 FreeBSD and a large number of files

FreeBSD has a great technology for caching file names in a directory. So, if you have a lot of files in one directory, it is much better to use a hash table search rather than constantly running across the entire tree in breadth / depth in search of the desired file. However, max. The amount of memory allocated for dirhash (the so-called this technology) is limited to vfs.ufs.dirhash_maxmem and by default is, like, 2MB, which is very small. It is recommended to increase memory until vfs.ufs.dirhash_mem ceases to rest on the "ceiling".

1.6 Softupdates, gjournal and mount options

New terabyte screws are just gorgeous - they are cheap, and their performance is just cashier. However, there is one caveat: when electricity is cut off at the data center, the fsck of such terabytes may take more than one hour. You can solve this problem using softupdates or you can attach journaling via gjournal to the system. What exactly is up to you .
A couple of tips on journaling: in order not to lose productivity, it is better to fix the journal section on a separate disk, and in order not to catch panic because of its overflow, it is better to make the journal section bigger (for example, RAM + swap).
~~If you have a raid with BPU,~~ or you just have nothing to lose, you can add the async option to /etc/fstab . And such an option as noatime can be almost safely recommended to everyone. (read giner 's comment here )

2. Optimization of frontend (nginx)

In fact, I am extremely opposed to excessive and / or premature optimization, which include frontend optimization. Usually on web projects where nginx is not only engaged in statics, it consumes 1% -5% of CPU, depending on the nature of its use, the rest is eaten by php.
However, optimization of the nginx config can affect the overall response time of the site, so there are points that are worth talking about.
From standard optimizations I can recommend

  reset_timedout_connection on;
  sendfile on;
  tcp_nopush on;
  tcp_nodelay on;

Well, play around with the number of workers, and not just put them on the number of CPU / Screws. I also recommend everyone to get acquainted with this document and the topic of Nginx best-practices on Serverfault, it is very likely that you will learn something new.

2.1 Accept Filters

FreeBSD has a technology that allows you to transfer a packet from the kernel to the process only if 1) any data arrives 2) a valid http request. This technology is called accept filters . Such filters will help both to unload the server in case of a large number of connections, and to protect it a little against DDoS'a (Although the second one is better at handling ngx_http_limit_req_module , which has already been written more than once in Habré )
To enable the processing of connections using filters, you must first load the kernel module:

  #ls / boot / kernel / | grep acc
    accf_data.ko
    accf_http.ko
 #kldload accf_http

Next in the nginx.conf config nginx.conf enable the httpready filter:

  listen 80 default accept_filter = httpready;

2.2 Caching

Nginx has a very flexible response caching system, both from fastcgi and proxy backend. I think everyone who read the documentation in his head immediately had several scenarios for applying caching in his project. I can only give general advice:
God forbid you give rss via php script. If so, then you can safely cache the answer for 3-5 minutes.
I think almost the entire version of the site for guests can be crammed into the cache for 5 minutes too (well, of course if you have a non-news site)

In addition to the server cache, there is also a client cache. I would recommend to hang expire per month on all static:

    
         location ~ * \. (jpg | jpeg | gif | png) $ {
    	    root / var / nnm club;
    	    expires 30d;
         }

2.3 AIO

About the introduction of AIO in nginx already written on Habré , there are also quite interesting discussions in the comments. In short, AIO is useful for very specific loads, as well as helping to save the response time while reducing the number of workers.
To use aio, you need to load the aio.ko kernel module:
# kldload aio
and then enable aio and sendfile in nginx.conf

 
  sendfile on;
  aio sendfile;

Newer versions of nginx allow aio to be used with sendfile. Regarding this configuration, the documentation states:

In this configuration, the SF_NODISKIO flag is used and sendfile () is not blocked on the disk, but reports the absence of data in memory, after which nginx initiates asynchronous data loading, reading only one byte. At the same time, the FreeBSD kernel loads the first 128K files into memory, however, on subsequent readings, the file is loaded in parts only 16K each. Therefore, this mode is best used for distribution of small, up to 128K files.

The patch for FreeBSD, to solve this problem here , maybe over time it will go into -CURRENT and will be ported to 8.0 and 7.x

3. Backend optimization

Here you can tell me a lot of things, for example, java has magic lines from the series "-Xms768m -Xmx1280m -XX: + UseConcMarkSweepGC -XX: + CMSIncreationalMode -XX: + UseCompressedOops -Djava.net. which only our java-programmer is able to understand, and in PHP 50% optimization does caching opcode , the rest is observed from caching responses from the database. So this part of the topic will be very stingy.

3.1 APC

About the optimization of the APC told the developers of Facebook . I highly recommend reading if you have time.

3.1.1 APC locking

Old File Locking, this is exactly the “brake” that is why eCclelerator is being used instead of APC. So, the default locking is often recommended to change to spinlock or pthread mutex. As far as I remember, pthread mutex has become default since 3.0.16, so if you have a server with an old APC, I recommend updating it.

3.1.2 APC hints

If you have a lot of .php files or you cache a lot in the APC user cache, then you will very likely have to raise the values
apc.num_files_hint and apc.user_entries_hint respectively in php.ini . These values are responsible for the hash sizes of the APC tables (in fact, they are still doubled before use), and we know that the table hash works very poorly at load factor > = 0.75

3.1.3 APC fragmentation

Fragmentation in APC is such a thing, because of which you want to take this same cache, crumple it and throw it out the window. APC is not a replacement for a normal key-value due to its inability to delete TTL or LRU automaton. That is, there are no GCs and entries that have fallen into the cache can go from there only in two cases:

When they were contacted after their expiry ttl
All memory is over and APC has resorted to an emergency measure - resetting the entire cache entirely.

Summarizing, you can say: High fragmentation is a sign that you are misusing APC.
Here you need to add a note, judging by the comments here , in 3.1.x a lot was corrected in terms of memory allocation, but judging by this , 3.1.x does not work for everyone.

3.2 PHP 5.3

Everything seems simple here - we are updating PHP, we get a performance boost . However, looking at the list of deprecated functions in 5.3, you can be horrified, since they are still working.
Despite all the simplicity, I think the transition from 5.2 to 5.3 will be a long time, especially in production.

4. Database Optimization

In fact, the best DB optimizations in our club are:

Not using RDBMS at all (sphinxsearch)
Not using base (caching)
Batch requests (where in (...), batch insert / update)
Asynchronous work with the database (memcacheQ, apacheMQ, AQMP, crontab)

However, most of the above tools require a fairly serious rewrite of the application.

4.1 MySQL

There are quite a few manuals on MySQL optimization, there are literate , there is not much. In any case, during the lifetime of any web project, its database will have time to rest in memory, and in disk, and maybe even in the processor, so it’s hard to use simple howto, you’ll have to watch conferences, learn to use profilers (oprofile, systemtap , dtrace) and use a large number of additional software . In other words, not only to understand what indexes, sorting and grouping are, but also to understand how MySQL uses them internally, to know what EXPLAIN, Query cache are, the advantages and disadvantages of various storage engines , in general, be 100% DBA for your project.
Next, I will describe ways to help with minimal code changes (or even without them), to optimize MySQL.
As I said in the previous part, 50% of MySQL tuning can be done in semi-automatic mode with just two utilities:

4.1.1 Transition to 5.1

The transition to 5.1 brings a lot of bonuses , I was particularly interested in:

Optimizer Optimization (especially GROUP BY)
InnoDB plugin
Partitioning
Row based replication

All this can add performance, so it is worth climbing over 5.1. As for the noise around the fact that 5.1 is not stable, then all this is no longer relevant, and initially it was too bloated (if anyone has any contraindications or a bad expirience of switching to 5.1 - please, in the comments).
The most extreme, of course, have been testing 5.4 for a long time, they say the performance is very well raised (not without the help of patches from Google and Percona, I think). But before production 5.4 is still far away.

4.1.2 Transition to InnoDB

Tell me, if you use MyISAM, then why him? I don’t understand how a production server can live on MyISAM (although they are said to exist), where there are no transactions (if the server drops during a large UPDATE, then half of the data will be changed, and half will not), there is a TABLE LOCK (during recording in the table is locked for reading and vice vresa), and REPAIR after a loss of electricity in the data center can take dozens of hours. The only thing that saves MyISAM is the presence of Fulltext Index, but even it can hardly compete in quality and speed with sphinxsearch.

Yes, InnoDB has its drawbacks (deadlocks, larger indexes, no FTS), but there are bonuses, IMHO, many times more: Firstly, InnoDB is completely ACID-compatible , which means any operation, even such as a dump of the base data, it can be made in one transaction (option mysqldump --single-transaction ), and secondly, it has row-level locking (against TABLE LOCK for myisam), which means that data can be simultaneously read and written in several flows without blocking each other.
Again, people who are completely concerned about the performance of their startup heart can use XtraDB , they say it helps a lot on I / O-bound workloads.

4.1.3 MySQL Internal Cache - Query Cache

Query Cache is one of the most misunderstood parts of MySQL. Many people put it in 512Mb and think that “everything flies right now”, many people disconnect it altogether because “it doesn't work vseravno”. I will try to shed light on the value of this parameter. To begin with, what's more, in this case, is not better, so you shouldn't be bullied. Further, it is necessary to clarify that Query Cache is a completely non-parallel subsystem, so on the number of processors> = 8 it is better to disable it, because it will only slow down. And finally, but not the most useless - the essence of Query Cache is that its contents, which belong to any table, are completely reset when any change is made to this table itself . That is, in fact, Query Cache gives a performance boost only on well- normalized tables.
Read more about QC here .

4.1.4 Indices

As the absence of an index is detrimental to SELECT, so too many indices are detrimental to INSERT / UPDATE. It often happens that the old, once made index can live in the database for more than one year, taking up precious memory and slowing down data changes. A simple SQL query comes to the rescue. Helpful? Many similar tips can be found here .

4.2 PostgreSQL

For me, Postgres remains a rather strange system: on the one hand, it is an Enterprise class base and runs Skype on it, with a different default setting such that it can run even on my cell phone. In general, it is necessary tyunit, and tyunit here you can almost everything. Of the possible almost 200 parameters, 45 are responsible for tuning the main ones =)
By the way, what else struck me was that when he memorized a line in the config, PostgreSQL does not reset it to “default”, but uses the one that “remembers” ... It seemed?
By tuning Postgres in the internet, a lot of things (some of the manuals are outdated, so pay attention to the publication date and the keyword vacuum_mem, which has been replaced with maintenance_mem in new versions). There is something like a FAQ: just about the main thing, there are very profound treatises for advanced database administrators ... I’ll tell you only the basics that will help the project to stand on its feet while the admin with the programmer is looking for a qualified DBA.

4.2.1 Indices

Here some may have a fair question: why, they say, MySQL had indices in the last place, and PosgreSQL in the first place? It's simple, because its capabilities in this regard are much higher than those of MySQL. B-tree, hash, GiST, GIN, as well as multicolumn, partial and indexes on expressions: a person who programs under PostgerSQL must understand this. And not just to know that such exist, but to understand when it is necessary to use some types of indices, and in what others.
Useful for monitoring SQL queries (including index statistics) can be found here .

4.2.2 pgBouncer and others.

pgBouncer (or its alternative) is the first thing that should be installed on the server with the database. I have already seen enough of cacti-graphs showing the tenfold drop in load from the simple installation of the connection manager. If you have not installed a connection puller, then a separate process is launched for each connection to the database, then this process eats up at least work_mem and begins to fight for the CPU and a hard disk with similar ones executing a SQL query. Everything would be fine, but when the number of such processes exceeds 200-500, the server, even very powerful, becomes tight. Very tight. pgBouncer saves us from this.
Also, a list of useful or even irreplaceable applications for working with PostgreSQL can be found on the postgresqlrussia.org website.

4.2.3 pgFouine

pgFouine is just one of these indispensable programs. This is a very advanced analogue of mysqlsla in php. Included with Playr (replayer production logs), it allows you to optimize requests for staging servers in almost “combat” conditions.

4.3 Database Unloading

As I already said, the best way to optimize the work of the database and increase its performance is to access it as little as possible.

4.3.1 SphinxQL

In the last article I said about the fact that we entered a search based on sphinxsearch. But not everyone will be able to search through thousands of lines of code to start using SphinxAPI, and then spend another 1-2 iterations on testing and capturing bugs. The problem was solved very elegantly: SphinxSearch learned to pretend to be MySQL server, that is, in order to start using it, you only need to create sphinx.conf , create entries for the indexer in cron and switch the search to another “like mysql” database. There is a possibility that no further editing of the code is needed.
What do you get when you go to the sphinx? In addition to improving the speed and quality of the search, you can get rid of MyISAM and its FTS, well, and what is very interesting is to come up with new applications for your search , we have combined the search with RSS, which turned out to be very convenient.
And here are a couple of examples in which you can benefit from the presence of sphinxsearch (invented almost on the move, so do not hit hard):

Example 1 : I’ve been waiting for Starcraft 2 for a long time, I can always add the following RSS feeds to Google Reader: rss.php? Q = "starcraft 2" and see all the messages that discuss it. Also, I, as a forum member, want to see all posts where my nickname is mentioned. This is also not a problem, you just need to correct the URL.

Example 1.5 : The user came to the site saw the search string, entered a query, nothing was found, but the link “Subscribe to this search” appeared, with a link to RSS for this query. So just a user will not leave you =))

Example 2 : I want to find movie 21, or, God forbid, movie 9 - MySQL is not suicidal, it simply throws such a request, complaining that ft_min_word_len longer than the length of the request. The Sphinx will almost instantly return the result in the form of “Twenty One / 21”, and in the second case it will even give a choice between “District No. 9 / District 9” and “Nine / 9”.

4.3.2 Non-RDBMS Storage

There are so many places in the project where you can not use a relational database. A simple key-value repository. The benefit of these now enough .
There are also very interesting projects, such as Hive (Data Warehouse with SQL-like QL and backend in the form of Hadoop). In general, with the data storage, oh how to experiment, not just one Oracle (MySQL, PostgreSQL, FoxPro, underline) is alive.
Also, the database key-value due to its speed is used to cache samples from relational databases. Regarding caching itself: after scrolling through the presentation, watching the video and reading the full text of the report from the blog of Andrei Smirnov , I have practically nothing to add. Just a couple of tips:
If you have a really big PHP project, do not forget about the possibility of opcode cache storing custom data. You can store the most frequently used global variables in it: firstly, there are not many of them and they are small, so they will eat a little bit of memory, and secondly, the sampling speed will be higher than from memcached located on a neighboring machine.And the most interesting thing: in large projects there were cases when a block of global variables was written to any one machine from memcached farm, since these variables use all the backends, then the traffic to this machine increased to indecent sizes and the machine started to slow down very much and with it all the backends. The way out of this situation would be to either store global variables in the opcode cacher'e of the APC / eAccelerator type, or clone the variables to all servers from the memcached farm and add exceptions to the consistency hashing algorithm.

4.4 Encodings

A small note about the encodings:
UTF-8 is good for all but one - the Russian text in it takes exactly twice as much space, so sometimes there is something to think about before using it if you have a completely single-language contingent.

4.5 Asynchrony

In fact, it is not always required synchronous data processing, quite often it can be replaced with asynchronous.
Asynchronous processing helps 1) Improve the response time of the site / application 2) Reduce the load on the server.
If with the first everything is clear, then the second is a consequence of the fact that batch requests are executed faster than single ones.
Asynchrony can be organized in different ways. In large projects, message queues are used for this ( ApacheMQ , RabbitMQ , ZeroMQ . About AQMP has been written several times on Habré ), in small ones you can do with cron.

Application. Little things.

1. SSHGuard or alternative.

Besides the fact that it is a standard practice to set anti-brute force for ssh, it also additionally helps protect the server from sudden bursts of Load Avarenge when it is attacked by completely stunned bots and begin to brute force login-pass pairs by the tens of thousands.

2. xtrabackup

LVM snapshots are a brake . mysqldump locks tables and makes text backups that can be restored in weeks. A very good backup tool for MySQL is Percona xtrabackup. I will not paint him strongly, because the Russian part of the Internet has heard about him. In short, xtrabackup is a tool that allows you to perform non-blocking binary backup of InnoDB / XtraDB tables and has a bunch of settings . Why very good, and not great? As the best tool, in my opinion, are clones in ZFS. They are done instantly, and restoring the database from them is just changing the path to the files in the muscle config, and even if the restore fails, you can roll back. Also with clones you canrestore the entire system , for example, in case of an unsuccessful kernel upgrade.
In general, it seems, in 5.4 we are promised an embedded online backup

3. Transfer mail to another host

This optimization seems to be minor, however, with a large number of spam flowing to the server, it helps to greatly reduce traffic and save many IOPs.

4. Integration with third-party software

Somehow on Habré I skipped an article about online games, they say, for everything you need to use the most appropriate means (does anyone remember how it was called?). So, for example, in order to enable users to exchange e-mails with attachments, you do not need to start writing a PHP script with backends in the form of a database / file system, you can use this, which in real life is used for text messaging - a bunch of smtp / imap and write to them a simple adapter. By analogy, chat for users can be organized on the basis of a jabber server with a javascript client that will not load the server at all. Need to mark objects on the map? It's easier to write a mashupusing Yandex / Google maps. What is interesting is that such systems, written on the basis of the adapter for ready-made products, often scale very well, at least, orders of magnitude better than PHP and MySQL solutions.

5. Monitoring

There is nothing to optimize if you do not know the current state. Performance metrics, delays, free resources, all this should be monitored, logged and, preferably, drawn on charts. The benefit of the tools is enough: Nagios , Zabbix , Cacti , Munin ....

Take any, put on a server (s) and watch the influence of optimizations on the server load. Monitoring will also help anticipate the occurrence of performance problems.

6. Cons optimization

Bleeding-edge, he, in fact, is not by chance so named. When the club moved to a new server, we felt it the hard way , having managed to find bugs in almost everything ( APC1 , APC2 , MySQL , nginx , xbtt ), thanks to OpenSource, something simple can be fixed.

Instead of epilogue

Well, sort of, that's all. I printed it for almost a week ... I mastered it, I mastered it, behind the scenes I left ZFS, distributed filesystems, replication and sharding, because these are topics for individual posts. With grammar and punctuation, everything is very bad for me, even though I, of course, checked everything several times in Word and tautology , so if you find something, write to the PM , correct it.

Criticism of the article is welcome, because if you find a jamb in a post, then, most likely, you found a jamb in one of my projects.

Source: https://habr.com/ru/post/70167/

All Articles