Using our previous experience in
memcached performance testing on SunFire X2270 (server based on Intel Xeon (Nehalem) processors) with OpenSolaris, we decided to run the same tests on the same server, but using RHEL5. We have already noted in the post with the
first test results that in order to achieve the highest possible performance, we used Intel Oplin 10GbE network cards. As it turned out, to use this card under Linux, we had to work a bit on the drivers and rebuild the kernel.
- With the default ixgb driver from the RedHat distribution (version 1.3.30-k2 in kernel 2.6.18), the network interface just hung while running the test.
- So we had to download the driver from the Intel site (1.3.56.11-2-NAPI) and rebuild it. With it, everything worked and the maximum throughput we received was 232K operations / second on the same kernel version 2.6.18. However, this version of the kernel does not support multi-stream data transfer ( note of the translator - in the original multiply rings).
- Kernel version 2.6.29 includes multithreaded data transfer, but still does not include the latest version of the ixgb driver - 1.3.56-2-NAPI. Therefore, we downloaded, compiled and installed new kernel and driver versions. It worked and after a little tuning gave us the maximum throughput of 280K oper. / Sec.
results
As we already
reported , the system with OpenSolaris and memcached 1.3.2 gave us a maximum throughput of about 350K ops. On the same server with RHEL5 (with kernel 2.6.29) and the same version of memcached, we received 280K ops / sec. It turns out that OpenSolaris bypasses Linux by 25%!
Linux tuning
The following system values were used to maximize performance.
net.ipv4.tcp_timestamps = 0
net.core.wmem_default = 67108864
net.core.wmem_max = 67108864
net.core.optmem_max = 67108864
net.ipv4.tcp_dsack = 0
net.ipv4.tcp_sack = 0
net.ipv4.tcp_window_scaling = 0
net.core.netdev_max_backlog = 300000
net.ipv4.tcp_max_syn_backlog = 200000
The following parameters specific to the ixgb driver were set (2 queues to receive and 2 to send)
RSS = 2.2 InterruptThrottleRate = 1600.1600
OpenSolaris tuning
In / etc / system we set the following parameters for MSI-X (
note of the translator is in the original MSIX, but the parameters refer specifically to MSI-X):
set ddi_msix_alloc_limit = 4
set pcplusmp: apic_intr_policy = 1
For the ixgbe interface, the 4th transfer queue and the 4th reception, gave us the best performance:
tx_queue_number = 4, rx_queue_number = 4
And besides, we allocated separate processor cores to the network interface:
dladm set-linkprop -p cpus = 12,13,14,15 ixgbe0
Upd : The answer is Shanti why they used 2x2: Basically, we tried various settings. On Linux, we got the best results with 2 + 2 rings. On OpenSolaris, it was 4 + 4. We are not performing well yet (it is relatively new).
Short translation: because gladiolus