When is Atom faster than Core?

Stuck in a traffic jam at the wheel of a car that theoretically can reach speeds of more than 200 km / h, and looking at how ~~tricycle cyclists~~ overtake me, I thought ... no, not about how to transplant everyone on bicycles, and not about solving transport problems of humanity with the help of teleportation, and ... about the processors Intel Core and Intel Atom. Namely - Atom compared to Core is, in fact, a scooter compared to a car. It consumes less fuel and is much cheaper. But on the other hand, the speed of the scooter is just as noticeably inferior to the car (despite the ways to “accelerate” the scooter above the factory settings). But, nevertheless, in traffic jams or on narrow streets, the scooter is faster. No wonder the scooter got its name from the English " to scoot " - to run away, as it was successfully used by English teenagers to rescue from the police.
Now back to the CPU. Replace "fuel" with "electricity" and "speed" with "performance", and we get a complete analogy of the behavior of Inel Atom and Intel Core. But then it is reasonable to assume that there are such “traffic jams” and “nooks” in which Atom will overtake Core. Let's look for them.

So, according to generally accepted performance measurements, Intel Core significantly overtakes the Atom. In the "Performance" section of the Intel Atom article on wikipedia, the harsh sentence is read: " about half the performance of a Pentium M processor of the same frequency "
If we compare Atom with Core, then according to tomshardware tests, the Intel Core i3-530 wins the Intel Atom D510 with a crushing score:

3DS MAX 2010 (rendering)	Core i3 is 4.36 times faster
Adobe Acrobat 9 (create pdf).	Core i3 4.55 times faster
Photoshop CS4 (applying a range of filters)	Core i3 3.8 times faster

At the same time, it should be noted that tomshardware to Atom is clearly biased. So, for example, if the running time of a task on Core-i3 is 1:38, then this is exactly what is reported about - “one minute, 38 seconds”. And if Atom performs something for 7:26, then this is, according to the authors, “about eight minutes”. But the main thing is to compare processors with different clock speeds (2.93 GHz Core i3 and 1.66 GHz Atom) and not correcting ~~for the wind is~~ not indicative. That is, the result of Core should be divided by 2.93 / 1.66 ~ 1.76, which gives the final result of losing Atom from 2.15 to 2.6 times.

Why is Atom slower?

The quick answer: because it is cheaper and energy efficient, which is incompatible with high performance.
Correct answer: First, because the Atom still has the FSB bus, while the Core i3 has an integrated memory controller in the CPU, which speeds up data access. In addition, the Atom has four times the size of the cache, and if the data does not fit into the cache, a slower memory access affects the performance of the full program.
And secondly, the Atom micro-architecture is not Core2, used in Core i3, but Bonnell. In short, Bonnell is a Pentium idea manager, there are only 2 integer ALUs (versus three in Core), and most importantly, there are no core reordering of instructions ( instruction reordering ), and register renaming , as well as speculative execution ).
From where it is clear that to help Atom overtake the Core, it is necessary:

Take a small ~~nano~~ set of data, so that it fits in the cache.
Try using float data to load non-ALUs, but FPUs.
Whenever possible, deprive Core of the advantages of a disordered execution

Since everything is clear with the first two points, you can run the first tests.
They were carried out on my existing Intel Core i5 2.53 GHz and the already mentioned Atom D510, and represented a set of calls to mathematical functions for float data with built-in performance evaluation “number of functions per second”, i.e. the bigger, the better.
The tests included the calculation of trigonometric functions both directly (C runtime, test "x87"), and decomposition in a row; using the code of the Cephes library; as well as vector implementation via SSE intrinsic functions (tests with the ending _ps). At the same time, given the difference in clock frequencies, the results were scaled to 2.53 / 1.66 ~ 1.524
Tests were compiled by Microsoft Visual Studio 2008 with optimization in release by default.

The data obtained fully confirm the first place Intel Atom from the end. That is, the goal is not achieved, move on to the next item - we will complicate the work of the Out-of-order CPU.
')

Complicate the task

We will create an artificial test that will contain unpredictable branchings that contain computationally heavy functions, so that the result of speculative calculations of the Core is constantly discarded, i.e. turned out to be unnecessary work.
Like that:

int rnd= rand()/(RAND_MAX + 1.) * 3; if (rnd%3==0) fn0(); if (rnd%3==1) fn1(); if (rnd%3==2) fn2();

Moreover, the functions will consist of chain calculations, so that Core cannot, by reordering instructions and renaming registers, count any of these expressions in advance, “out of turn”. Here is the simplest example of such a code.

  for (i=0; i < N; ++i) { y+=((x[i]*x[i]+ A)/B[i]*x[i]+C[i])*D[i]; }

By the way, similar functions are used in the tests cephes_logf and cephes_expf, which is shown above, where the advantage of Core is minimal.
But, despite all the obstacles, Core was still faster. The minimum separation of Core from Atom, which I managed to get with various combinations of calculations and randomness - as much as two times! That is, Atom still lags behind.

But if I stopped at this, you would simply not know about it - the post would not take place.
The next step was compiling tests using the Intel Compiler. The version of Composer XE 2011 update 9 (12.1) with the default release optimization settings was used - similar to the Microsoft compiler.

The graph below shows the results of the aforementioned tests, including the rand I added, compiled both by VS2008 and by Intel Compiler.

Watch carefully. This is not an optical illusion. For the four tests, the green line points showing the Atom result for tests compiled by the Intel Compiler are higher than the burgundy points — the i5 result for tests compiled by VS2008. That is, Atom turns out to be real, more than twice, faster on the same code as Core i5.

Think it's an Intel compiler ad?
Absolutely not. I do not work in the advertising department or in the compiler group.
It’s just a statement that your optimized code can run on Atom much faster than non-optimized on Core. Or - non-optimized for Core will be slower than optimized for Atom.
These are the very bumps and back streets that prevent the car from accelerating.
Conclusions can do yourself.

Source: https://habr.com/ru/post/148306/

All Articles

When is Atom faster than Core?

Why is Atom slower?

Complicate the task

More articles: