
Recently, news of plans to conquer the server market with systems built on ARM-architecture has begun to appear more and more often. Moreover, real server ARM-processors from
Calxeda , as well as systems based on them from
Boston Viridis and soon from HP -
Moonshot , were embodied in silicon.
I have been using Intel® Atom ™ servers for 4 years now, but ARMs are familiar to me only from the mobile phone and tablet side. What is it capable of, modern ARM processor? Will he be able to compete with Atom? I did not find a direct comparison on the server front, only
synthetics on Phoronix. Interesting
testing was on AnandTech, but there Xeon-s. Calxeda in its
benchmark also compares with Xeon-s. It was also interesting for me to compare it with Atoms in conjunction with Linux + NGINX for the return of statics.
Atom-s, in addition to netbooks / nettops - the original purpose of these processors, took root in my universal routers (proxy-mail-firewall) for a medium-sized office, in network storages (file server), in front-end and servers for static. Why exactly statics - because it is the minimum sufficient processor for such tasks. And even though dear comrades
write that HP knows better (me), and that this may well be an application server or cloud computing, I think that the caliber of the gun should at least somehow correspond to the size of the carcass. Yes, a cloud of mosquitoes, may well tear the bear, but it will be long and painful.
')
Knowledgeable people will be sure to write in the comments that
you need to use CDNs for statics. Anticipating such a question I will answer - yes,
you can use it if the scale of the task (and the customer’s wallet) is appropriate. I periodically spend monitoring prices, and have not yet seen anyone compare the price with rented Atoms that hang on unlimited (or almost unlimited) 100-megabit uplinks. For example, 5 servers, each of which produces from 7 to 22 TB per month. How much will it cost on Amazon?
Choosing an opponent
For testing, you need a motherboard with a modern multi-core ARM processor, a SATA port and at least a 100-megabit Ethernet. The choice turned out to be extremely limited - here either the aforementioned system from Boston Viridis for $ 20,000, or the
SABER Lite developer fee for $ 220. I chose not long :)
Unfortunately, at the time of the order, a more advanced (+ 1GB RAM, + WiFi, + Bluetooth) and cheap ($ 129) board on the same
Wandboard processor was still unavailable, and the
ARMBRIX Zero dream limit on the Samsung Exynos 5250 (ARM Cortex-A15 @ 1.7GHz) already canceled.
The
Freescale i.MX6Q processor is based on the Cortex-A9 core, i.e. not the freshest, but Calxeda is based on the same core, and I simply could not find anything else. In addition, I had an extensive experience of picking with the previous i.MX51 series on Chinese tablets. After studying the work of Cortex-A9 and taking on trust that the A15 at the same frequency
will be 40% faster , you can roughly extrapolate the results to a hypothetical system with Cortex-A15 at a higher frequency.
Competitors
The Atom processor will be presented in two models:
Two models are needed to figure out how the system scales in frequency. Unfortunately, Intel decided that in cheap desktop processors, energy-saving technologies (power / frequency control) are useless; they already consume everything-nothing.
The i.MX6Q processor will also be considered in two versions - 996MHz and 396MHz - these are the frequencies supported by cpufreq.
Software and settings on all systems are identical: Gentoo Linux 3.x, NGINX 1.4.1, OpenSSL 1.0.1e, GCC 4.8.1 (-O3 -march = native), and on ARM there is an additional option to generate code in Thumb (- mthumb). There are different opinions about the latter, but specifically on my system, besides the fact that the code was obtained is guaranteed smaller, in most tests it is also slightly faster.
The system on Atom will be tested in 32-bit mode and in 64-bit x86-64 and x32 ABI. Why x32? - It is interesting to check the
debunking of myths !
Test systems and client computer gigabit ports are included in the gigabit switch.
Weighing
| Intel Atom D2700 | Freescale i.MX6Q |
---|
Number of Cores | 2 | four |
---|
Number of threads | four | four |
---|
Clock frequency | 2.13 GHz | 996 Mhz |
---|
L2 cache | 1 MB | 1 MB |
---|
Digit | 64-bit | 32-bit |
---|
Maximum power consumption of the processor | 10 W | 3 W |
---|
Maximum power consumption of the board + SSD under load | 30 W | 9 W |
---|
The weight of the motherboard :) | 336 gr | 74 gr |
---|
Price | $ 52 | $ 40 |
---|
It is clear that directly comparing these systems is not entirely correct, since different architectures, the i.MX is a self-contained SoC, and the D2700 is a classic microprocessor that needs piping from the south bridge and other companion chips. There are no Atoms yet with the same degree of integration, but the process is taking leaps and bounds.
The advantages of Atom-s include support for 64-bit mode, the practical use of which we will check in tests.
i.MX6Q is based on the Cortex-A9 core, which means that, unlike the opponent, it supports out-of-order execution, plus 4 honest cores. In addition, it integrates the hardware cryptographic engine CAAM - also try to explore.
In terms of total Gigahertz, parity is almost 4.26 for Atom, versus 3.98 for i.MX6Q.
By deliberate decision of the judges rivals are recognized in equal weight category, and are allowed to the next stage.
PHOTOSESSION
For the scale in the photo there is a brand new SSD Silicon Power T10 (JMicron JMF616) with 32GB. All test systems will be installed on it.
This is not the fastest option, but its stated reading speed of 200 MB / s with enough margin is enough to block the gigabit channel, not to mention 100 megabits.
There is also an mSATA module, as a monument to human stupidity, and a reminder that not every Mini-PCIE == mSATA. Even on Intel's.

Qualification
Instead of SPEC-synthetics, we will conduct simple tests of subsystems on which the performance of a web server depends.
Data was collected using the following
Training - gzip compression
Suppose a server is a front end for a PHP-FPM / FCGI server. Compressing the answer is a sign of good tone. Well, the traffic is saved.
Compressed 441 files (typical pages from the phpBB forum) total size 42MB.

i.MX lose, but this is a single-threaded test, so maybe not everything is so bad. We'll write down on the slow memory subsystem.
Warm up - OpenSSL encryption speed with RC4 (fast) and AES-256 (cool) algorithms
Many clients want https: //, so everything, even pictures, needs to be encrypted.
i.MX6Q includes a CAAM cryptographic accelerator that supports
many useful thingsSecure memory feature with enforced access control
Cryptographic authentication
* Hashing algorithms
* MD5
* SHA-1
* SHA-224
* SHA-256
* Message authentication codes (MAC)
* HMAC-all hashing algorithms
* AES-CMAC
* AES-XCBC-MAC
* Auto padding
* ICV checking
Authenticated encryption algorithms
* AES-CCM (counter with CBC-MAC)
Symmetric key block ciphers
* AES (128-bit, 192-bit or 256-bit keys)
* DES (64-bit keys, including key parity)
* 3DES (128-bit or 192-bit keys, including key parity)
Cipher modes
* ECB, CBC, CFB, OFB for all block ciphers
* CTR for AES
Symmetric key stream ciphers
* ArcFour (alleged RC4 with 40 - 128 bit keys)
* Random number generation
* Entropy is generated via an independent free running ring oscillator
* Oscillator is not generating forropy; for lower-power consumption
* NIST-compliant, pseudo random-number generator seeded using hardware generated entropy
RC4 and AES-256 are among them. Very tempting hardware accelerated. But it turned out to be somewhat more complicated. Standard OpenSSL does not use the Linux kernel crypto-API. Freescale support wrote that ostensibly OpenSSL could use it through the NetKey API (AF_ALG?). But clearly not used. Then came across the
Cryptodev-linux module . Enabled
/ dev / crypto support in the kernel and rebuilt OpenSSL with -DHAVE_CRYPTODEV. Still does not see. Disconnected the native cryptodev, patched the kernel and the OpenSSL patch from this site. And a miracle! - they saw each other. But the speed was disappointingly low. Actually, this was a warning that on modern processors, most likely, it will be faster to encrypt in the software. And so it turns out beautifully, hardware, asynchronously, but slowly. Maybe something else tweaked in the drivers, and it will work faster, but not yet an option.

i.MX-s lose again, but not much. The difference in speed between two and four threads on i.MX is exactly two times, which is not surprising, since we have four fair cores. But on the Atom is much more interesting. In x86 mode, the difference is 1.45, i.e. Hyper-threading adds almost an entire virtual core, and in 64-bit mode the difference is only 1.3, - the effectiveness of Hyper-threading is lower, but the overall performance is still higher. The difference in results between the D2700 and D525 is proportional to the frequency.

i.MX finally suddenly breaks out strongly forward. Perhaps this is due to the fact that the algorithm requires much more computing resources, and the memory requirements are lower.
In the Atom camp, the difference in the work of Hyper-threading has become even more pronounced. The more optimal / faster the code, the lower the gain from virtual cores. Most likely, the case in
in-order architecture - the more fully one thread loads the kernel resources, the less it will remain for the second.
To battle
For testing on 4 systems,
nginx was raised with the same config and data. Three virtual servers with one root but different settings: normal on port 80, SSL with RC4 encryption on port 443, and SSL with AES-256 on port 444.
Statistics were collected using the
Apache Bench utility.
Round One - Static HTML File
Here and further, the speed is indicated in kilobytes per second, as it is given by ab, that is, 20,000 kB / sec is 20 MB / s
/ cXX shows the number of concurrency sessions.

The lag of i.MX-systems is very surprising. Yes, they give 100 megabits (and for my tasks this is enough), even as many as 500, but why not gigabits like Atom? A simple task, about which they say "how to send two bytes." Replaced patchcord, port, different settings in the core - the same thing. Then I rummaged through the specialized forums, and the ambush — this, it turns out, is a
hardware bug that cannot be cured by software.
Round Two - static HTML file with compression

Atom-s confidently win. X32 ABI shows a slight advantage.
Round three - static HTML file with RC4 compression and encryption

Round again for Atom-s. Huge superiority x64 over x86. A noticeable increase in the new-fashioned x32 ABI.
Round Four - Static HTML file with AES-256 compression and encryption

64-bit Atom's ahead. Completely inexplicable "shoots" x64 system. i.MX is a bit ahead of x86!
Round Five - JPEG file

Round Six - jpeg file with RC4 encryption

Round Seven - AES-256 JPEG Encryption File

Round Eight - Large 100MB File
Then the first knockdown occurred - i.MX began to constantly hang. Touched the lid, very hot, according to the sensors - 73 °.
The remaining three rounds were no problem.
In principle, they could not have been carried out, since the results completely coincide with the tests of JPEGs, which, in turn, are similar to the test with static HTML.
results
Atom wins on points.
But it turns out, for my ambitious tasks in the form of blocking a 100-megabit channel, the i.MX is enough for 400MHz. Well, the frontend (distributing compressed and encrypted HTML) is not for him yet.
If you believe the statements of ARM regarding the performance of the A15, including the memory subsystem, then the Cortex-A15 would probably be the winner.