📜 ⬆️ ⬇️

"Find the five differences." Scalable Generation Difference - New Test Portion



Less than two years since the announcement, as Intel introduced the second generation of Intel Xeon Scalable processors on the new architecture of Cascade Lake. Officially - April 2. The company itself calls it the largest launch in its history, strategically very important to it. Well, let's understand what is so special about these new Scalables.

What left?



Cascade Lake processors, or rather Cascade Lake SP, like their predecessors Skylake, still belong to the Purley platform, now the second generation - Purley Refresh. They are fully compatible with Skylake at the level of connectors, chipsets and motherboards, inherited from the first generation. But with nuances - for example, a new bios.
The process has not changed. The same 14 nm, however, with optimizations.
')
The general scheme of names and titles for the Platinum, Gold, Silver, Bronze series remained the same. True, the "suffixes" has become more. New Y, N, V ​​and S were added to the existing L, M and T. The number of the second position (hundreds) was changed in the numbering: now instead of one - two, that is, the successor, for example, Gold 6140 will be Gold 6240.

The rest of the basic characteristics and a set of features have not changed. The number of cores and cache volumes hold positions: up to 28 and 1 MB of L2 per core + up to 38.5 MB of total L3. The number and type of PCI-E lines are the same as they were - 48 lines of version 3.0. Scalability is the same: up to 3 UPI lines per 10.4 GT / s and up to 8 (seamlessly) sockets in the system.

What is added?


In general, there are many different micro-updates, but I would single out these more or less significant ones.

Firstly, in the Cascade Lake there were hardware patches against the vulnerabilities sensational last year . Intel introduced software and hardware solutions against options 2 (Specter), 3, 3a and 4 (Specter NG), L1TF (Foreshadow). For Specter Variant 1, only a software patch is still offered. That is all that is already in the line of Intel Core i9. And so it looks in the press release:


Secondly, there is support for DDR4-2933 memory. But with reservations: only for Gold and Platinum lines (Bronze and Silver still work with DDR4-2400) and only with one DIMM per channel - in the configuration with two DIMM per channel, the frequency is reduced to 2666 MT / s.

Thirdly, Intel Optane DC Persistent Memory (DCPM) premiered. The clearest wording about what it is , came out at Tiskom, so I quote:
"Intel Optane DC Persistent Memory (DCPM) is a new class of technology that combines those concepts called memory and storage and are intended for use in data centers."

You may remember, earlier Intel introduced Intel Memory Drive Technology for Xeon Skylake: the hypervisor (Xen) + Optane NVMe modules. We even had tests about this, but the results were not inspiring, and we decided to wait for a more impressive solution. It seems, waited =)

The new solution from Intel is based on DCPMM modules that are visually similar to DIMM and are also electrically and mechanically compatible with them. They work at a speed of 2666 MT / s and have a volume of 128/256/512 GB. At the logical level, DDR4-T (Transaction) protocol is used, which, according to Intel, is approved by JEDEC, but in practice it is supported only in Cascade Lake memory controllers. That is, a DDR4 DIMM connector was planted with an energy-independent memory made according to 3D XPoint technology, which, again, according to Intel, overtakes the widespread NAND Flash by three orders of magnitude (1000 times) in terms of speed and service life.

The solution turned out to be very interesting and highly ambiguous: naturally, there are features of operation (not without it), price and areas of application. But we will not focus on this, for this line of processors, killer feature will not, - a more detailed account of it goes far beyond the scope of today's article. As soon as the tests are ready in all possible modes of operation of this technology, we immediately roll out the longrid :-)

Fourthly, Intel Resource Director Technology (RDT), Speed ​​Select (SST) and Intel DL Boost technologies have been pumped through skills.

I'll start with the RDT. It is a mechanism for sufficiently fine monitoring and control over the execution of applications and the use of resources. The thing is not new, but in this lineup, hands were well attached to it and worked in detail. The bottom line is that the application with a higher priority on time to get everything he needs. Naturally, due to "infringement of rights" of other applications.

Now sst. Here is the same thing, but at the level of the nucleus: it allows you to strictly select a group of nuclei, which will have higher priority over others. The appearance this time is not a debut, but quite spectacular.

And for dessert Intel DL Boost. The innovation concerns a new set of instructions, previously known as Vector Neural Network Instructions (VNNI). Gizmo for AI, or rather, for a more flexible training of deep learning networks. In fact, another add-on AVX-512.

And finally, fifthly. According to the old tradition, for refreshments from Intel - more frequency, more cores :-) Both the base frequencies and the frequencies in the boost have increased by 200-300 MHz. With some exceptions, two cores per processor were added. The amount of supported RAM has increased.

Separately, it is worth noting the work of Intel to optimize the use of caches and RAM, probably to minimize the negative impact of patches from Specter and Meltdown family vulnerabilities.

More details about the architecture features of Cascade Lake can be found on wikichip . I recommend reading. And now - already traditional testing.

Testing


Eight Intel Xeon Scalable processors participate in testing:




Performance characteristics of platforms

All processors have the same basic configuration.


Software: OS CentOS Linux 7 x86_64 (7.6.1810)
Kernel: 3.10.0-957.12.2.el7.x86_64
Added optimizations relative to the standard installation: added kernel launch options elevator = noop selinux = 0
Testing is performed with all patches from Specter, Meltdown and Foreshadow attacks, backported to this core.

The list of tests that will be conducted:

  1. Geekbench
  2. Sysbench
  3. Phoronix Test Suite

A detailed description of the tests
Geekbench test

Package tests conducted in single-threaded and multithreaded mode. As a result, a certain performance index is issued for both modes. In this test, we will look at two key indicators:

  • Single-Core Score - single-flow tests.
  • Multi-Core Score - multi-threaded tests.

Units of measurement: abstract "parrots". The more “parrots”, the better.

Sysbench test

Sysbench - a test suite (or benchmarks) for evaluating the performance of different computer subsystems: a processor, RAM, data storage devices. Multi-threaded test for all cores. In this test, I measured one indicator: CPU speed events per second - the number of CPU operations per second. The higher the value, the more productive the system.

Phoronix Test Suite

Phoronix Test Suite is a very rich test suite. Almost all the tests presented here are multi-threaded. The only exceptions are two of them: single-threaded tests Himeno and LAME MP3 Encoding.

In these tests, the higher the score, the better.

  1. John the Ripper multi-threaded password test. Take the Blowfish cryptographic algorithm. Measures the number of operations per second.
  2. The Himeno test is a Poisson linear pressure solver using the Jacobi point method.
  3. 7-Zip Compression - 7-Zip test using p7zip with an integrated performance test feature.
  4. OpenSSL is a set of tools that implement the SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. Measures the performance of RSA 4096-bit OpenSSL.
  5. Apache Benchmark - the test measures how many requests per second a given system can withstand when executing 1,000,000 requests, with 100 requests being executed simultaneously.

And in these if less, it is better - in all tests measured by its time.

  1. C-Ray tests CPU performance on floating-point calculations. This test is multi-threaded (16 threads per core), will shoot 8 rays from each pixel for smoothing and generate an image of 1600x1200. Measured test time.
  2. Parallel BZIP2 Compression - The test measures the time required to compress a file (the .tar package of the Linux kernel source) using BZIP2 compression.
  3. Audio encoding. The LAME MP3 Encoding test runs as one stream. Measured test time.
  4. Timed GCC Compilation. Indicates how long the GNU GCC compiler build (version 8.2.0) takes. Units are seconds.

In this test, I removed the ffmpeg test, because it ceased to adequately pass on the total number of cores that modern gold has in a dual-processor configuration.

Test results






In the Geekbench test in a single-threaded and multi-threaded version, the new Scalable bypass the old in all positions. In a single-threaded test from 3% to 6%, in a multi-threaded from 6% to 13%, and the apotheosis - Silver 4210 is better than Silver 4110 by 33%.



In the Sysbench test, the difference is from 22% to 37%. The minimum gap between Gold 6140 and Gold 6240 is 7% in favor of the new.



In the John The Ripper test, the Silver 4210 overtakes the Silver 4110 by 41%, and the difference between Silver 4214 and Silver 4114 is almost 30%, naturally, in favor of the first. Now golda. Gold 6230 is faster than Gold 6130 by 16%. The minimum gap between Gold 6140 and Gold 6240 is 7.6%.



Silver 4210 overtakes Silver 4110 by 29%, and Silver 4214 predecessor by 23%. The gap between Gold pairs is 20% and 8%, respectively.



In the single-stream Himeno test, you can see a net increase of 200-300 MHz - from 2.2% to 6% in favor of the new generation.



The compress-7zip test almost completely copies the result of the John The Ripper: Blowfish test. The beautiful gap between the Silver 4110 and Silver 4210: 4210 is almost 35% faster than its predecessor. Silver 4214 and Gold 6230 by 18% and 20%, respectively, better than 4114 and 6130. The minimum gap between Gold 6140 and Gold 6240: the new is better than the previous 4.7%.



In the compress-pbzip2 test, the picture is similar to the compress-7zip test. Of the significant differences, the gap between the Gold 6130 and Gold 6230 has decreased, here it is 5.6%.



In the single-stream test Encode-mp3 again we see a difference of 200-300 MHz. From 4% to 7% - so much Scalable second generation is better than the first in this test.



In the openssl test, the largest gap between the Silver 4110 and the Silver 4210 is 41%. Between 4114 and 4214 - 29%. Golds are smaller. Between Gold 6130 and 6230 - 23%. And in a pair of Gold 6140 and 6240 - 4.6%. I note that the Gold 6240 is only 0.78% better than the Gold 6230.



In the Apache Silver 4210 test, the Silver 4110 is 40% better, the Silver 4214 outperforms the Silver 4114 by 36%, the Gold 6230 is better than the Gold 6130 by 21% and the Gold 6240 passes this test better than the Gold 6140 by 29%. I will especially focus on Silver 4210, Silver 4214 and Gold 6230: Gold 6230 is 3% better than Silver 4210 and 1.5% better than Silver 4214. That is, the gap is minimal. Gold 6240 is 13% better than Gold 6230.



In the GCC test, the new generation outpaces its predecessors by about 19%, 16%, 11% and 9.5%, respectively.



What happens in the end.

We observe a significant gap between the Silver 4110 and Silver 4210 - the new generation is better than the previous one in multi-threaded tests from about 20% to 40%. Thank you, frequencies and cores.
Between Silver 4114 and Silver 4214, the difference is already smaller: the test maximum — in the Apache test, reaches 36%.

Further, the gap is reduced. Gold 6230 overtakes Gold 6130 in the range of 11% in the GCC test and up to 23% in the OpenSSL test.

And finally, the minimum gap between the pair Gold 6140 and Gold 6240: the new one is ahead of the previous one by 3% -10% by the result of most tests. The exception is the Apache test: the difference is 28% - there are fewer cores, the base frequency is larger (Apache is a very interesting test in general).

Now go to the additional tests. But first, a brief background.

Testing RAM


The new Intel Xeon Scalable processors of the Gold 62xx line have begun to support the new type of DDR4-2933 RAM. We, which is quite logical, asked the question: how much the frequency of the RAM will affect the overall system performance. Generally, if we proceed from the assumption that a plus to a plus always gives something positive, it was believed that a fresh processor paired with a new memory would show itself well done. But one thing is to assume, and the other is to be convinced experimentally.

For the test, we took a Gold 6240 processor in a dual-processor configuration. The tactical and technical characteristics of the platform and the software component have not changed. The memory will be tested like this: DDR4-2400, DDR4-2666 and DDR4-2933.

It always makes me happy when you have everything you need at your fingertips to test hypotheses =) And now we go to see what happens.

RAM test results


When too good is bad. Therefore, I decided to abandon the idea of ​​drawing all the graphs and put the results in tables - more convenient and faster, although less clearly. Charts will also be, but only the most interesting, in my opinion.









"Either we are doing something wrong, or one of the two."

The Pilot Brothers quote, albeit slightly paraphrased, turned out to be very useful after the memory testing was completed ...

As in all tests, we made ten measurements and selected average indicators for them. As you can see, the testimony of tests varies as much as the testimony of citizen Krolikova from the movie "Shirley Myrli."

In tests of Phoronix 50 to 50, high results show configurations with RAM 2400 and 2933 MHz. The Geekbench test checked out 2933 memory on Memory Score_Single and Memory Score_Multi, but the overall result is surprising.

From the assumptions - the effect of greater frequency on latency. Hence the balance between speed and response time. But, frankly, not sure ... If you have something to say about this - please in the comments.

Last time, I was convinced that the failure to use all channels of the processor's memory has a greater impact on the test results. In the next testing of the processor, we will definitely consider this influence and I will tell you what and how.

A small step for a man, but a huge one for humanity


As Comrade Kamnoedov would say (I love the Strugatskys), “approximately in this approach” Intel is positioning a new line of Xeon Scalable processors. Back in the beginning of the article I said that the release of new Scalable for Intel itself is an important strategic step. Now I will explain.

On the one hand, the new Scalable marked the beginning of a global upgrade of the data center platform. And in the second half of the year we will have a couple of interesting announcements. On the other hand, all innovations are not accidental - this is a response to current industry inquiries. And quite a worthy answer. Little memory? Here you have Optane DC Persistent Memory. Were hardware prioritization of processes and cores? Please upgrade SST and RDT. Dreamed of professional training networks? :-) Here, sign here, a new set of instructions for AI. For Intel, you can only rejoice.

Although, personally, I have the impression that this release includes Wishlist, which Intel did not have time to implement last time. And, of course, something had to be done with hardware holes, the search for which for various specialists had already become a kind of entertainment. All that Intel took away from the user with the Spectra-Meltauny holes, it has now returned, retaining the price.

In addition, AMD is coming from all sides, whose decisions were to a much lesser extent susceptible to the negative influence of Spectrum-Meltdowns, and which Intel has been particularly hard-case lately like in the desktop (I would like to have similar youthfulness at such a solid age) and slightly in the server segment. By the way, in terms of the latter, it is very interesting to see how the new AMD Epyc Rome will show themselves, since the current generation of Epyc personally did not leave me indifferent.

But back to Scalable.

What is the bottom line for a user who is not burdened with AI and trained networks? Definitely an obvious increase in performance due to a larger number of cores, higher base frequencies and frequencies in the turbo bus. And if this increase in the maximum reaches 23% for Gold processors of different generations, both of them are good, but for Silver in some tests it reaches 40%. Taking into account the almost unchanged value, the difference is quite pleasant, although as always I want more =)

If you rely on Intel’s own statement that this is only the beginning, even a skeptic like me, it’s curious to see what interesting things will be offered to us in the future.

The testing used servers based on Intel Xeon Scalable processors: Silver 4110, Silver 4114, Silver 4210 , Silver 4214 , Gold 6130, Gold 6140, Gold 6230 , Gold 6240 .

Until July 25, servers with new Xeon Scalable can be ordered on the site 1dedic.ru with a 25% discount for 1 month for the promo code NEW_SCALABLE . The promotional code will burn at midnight on July 26, 2019.

For any dedicated server 10% discount when paying for the year.

For you tested and wrote Trashwind , senior system administrator of the operation department of FirstDEDIC

Source: https://habr.com/ru/post/457496/


All Articles