Many have heard of various optimized and improved kernels, this is the Zen Kernel and pf-kernel I’ve known. In addition to adding new features (TuxOnIce, support for aufs), they can improve performance thanks to an improved task manager (BFS) and scheduler (BFQ). In this topic, I want to compare the performance of the pf-kernel with the standard kernels in Ubuntu and Arch Linux, as well as describe the process of building and installing pf-kernel for Ubuntu. I don’t see much point in testing Zen Kernel. firstly, the project looks abandoned, and secondly, the set of patches is both very similar there and there.
Tests
Arch linux
Let's start with the Arch Linux test on a netbook.
UnixBench test results on the standard kernel (3.0-ARCH):
Test | Score | Unit | Time | Iters. | Baseline | Index |
---|
Dhrystone 2 using register variables | 3432673.5 | lps | 10.0 s | 7 | 116700.0 | 294.1 |
Double-Precision Whetstone | 821.7 | MWIPS | 10.2 s | 7 | 55.0 | 149.4 |
Execl throughput | 1048.3 | lps | 29.7 s | 2 | 43.0 | 243.8 |
File Copy 1024 bufsize 2000 maxblocks | 120834.3 | KBps | 30.0 s | 2 | 3960.0 | 305.1 |
File Copy 256 bufsize 500 maxblocks | 36417.8 | KBps | 30.0 s | 2 | 1655.0 | 220.0 |
File Copy 4096 bufsize 8000 maxblocks | 290993.0 | KBps | 30.0 s | 2 | 5800.0 | 501.7 |
Pipe throughput | 240124.9 | lps | 10.0 s | 7 | 12440.0 | 193.0 |
Pipe-based Context Switching | 21672.7 | lps | 10.0 s | 7 | 4000.0 | 54.2 |
Process Creation | 2885.9 | lps | 30.0 s | 2 | 126.0 | 229.0 |
Shell Scripts (1 concurrent) | 738.5 | lpm | 60.0 s | 2 | 42.4 | 174.2 |
Shell Scripts (8 concurrent) | 135.6 | lpm | 60.4 s | 2 | 6.0 | 226.1 |
System call overhead | 600176.7 | lps | 10.0 s | 7 | 15000.0 | 400.1 |
System Benchmarks Index Score: | 221.1 |
And here is the same test for pf-kernel (3.0-pf):
Test | Score | Unit | Time | Iters. | Baseline | Index |
---|
Dhrystone 2 using register variables | 3700926.6 | lps | 10.0 s | 7 | 116700.0 | 317.1 |
Double-Precision Whetstone | 846.1 | MWIPS | 10.2 s | 7 | 55.0 | 153.8 |
Execl throughput | 1343.2 | lps | 29.6 s | 2 | 43.0 | 312.4 |
File Copy 1024 bufsize 2000 maxblocks | 127468.0 | KBps | 30.0 s | 2 | 3960.0 | 321.9 |
File Copy 256 bufsize 500 maxblocks | 37622.9 | KBps | 30.0 s | 2 | 1655.0 | 227.3 |
File Copy 4096 bufsize 8000 maxblocks | 342606.2 | KBps | 30.0 s | 2 | 5800.0 | 590.7 |
Pipe throughput | 296672.7 | lps | 10.0 s | 7 | 12440.0 | 238.5 |
Pipe-based Context Switching | 41227.5 | lps | 10.0 s | 7 | 4000.0 | 103.1 |
Process Creation | 3969.3 | lps | 30.0 s | 2 | 126.0 | 315.0 |
Shell Scripts (1 concurrent) | 861.1 | lpm | 60.1 s | 2 | 42.4 | 203.1 |
Shell Scripts (8 concurrent) | 159.4 | lpm | 60.2 s | 2 | 6.0 | 265.6 |
System call overhead | 642005.3 | lps | 10.0 s | 7 | 15000.0 | 428.0 |
System Benchmarks Index Score: | 264.6 |
As you can see, the overall performance increase was 20%.
Ubuntu
Now the results of the same tests, but the same for Ubuntu.
On the standard kernel (2.6.38-11-generic):
Test | Score | Unit | Time | Iters. | Baseline | Index |
---|
Dhrystone 2 using register variables | 39162082.2 | lps | 10.0 s | 7 | 116700.0 | 3355.8 |
Double-Precision Whetstone | 9143.1 | MWIPS | 9.9 s | 7 | 55.0 | 1662.4 |
Execl throughput | 11472.2 | lps | 29.8 s | 2 | 43.0 | 2668.0 |
File Copy 1024 bufsize 2000 maxblocks | 1041722.3 | KBps | 30.0 s | 2 | 3960.0 | 2630.6 |
File Copy 256 bufsize 500 maxblocks | 327345.4 | KBps | 30.0 s | 2 | 1655.0 | 1977.9 |
File Copy 4096 bufsize 8000 maxblocks | 1730411.9 | KBps | 30.0 s | 2 | 5800.0 | 2983.5 |
Pipe throughput | 4204868.3 | lps | 10.0 s | 7 | 12440.0 | 3380.1 |
Pipe-based Context Switching | 738528.0 | lps | 10.0 s | 7 | 4000.0 | 1846.3 |
Process Creation | 32309.9 | lps | 30.0 s | 2 | 126.0 | 2564.3 |
Shell Scripts (1 concurrent) | 11023.5 | lpm | 60.0 s | 2 | 42.4 | 2599.9 |
Shell Scripts (8 concurrent) | 1425.4 | lpm | 60.0 s | 2 | 6.0 | 2375.7 |
System call overhead | 5723850.3 | lps | 10.0 s | 7 | 15000.0 | 3815.9 |
System Benchmarks Index Score: | 2580.4 |
On the pf kernel (2.6.38-pf8):
')
Test | Score | Unit | Time | Iters. | Baseline | Index |
---|
Dhrystone 2 using register variables | 71269301.5 | lps | 10.0 s | 7 | 116700.0 | 6107.1 |
Double-Precision Whetstone | 9175.2 | MWIPS | 9.9 s | 7 | 55.0 | 1668.2 |
Execl throughput | 12014.6 | lps | 30.0 s | 2 | 43.0 | 2794.1 |
File Copy 1024 bufsize 2000 maxblocks | 1580881.5 | KBps | 30.0 s | 2 | 3960.0 | 3992.1 |
File Copy 256 bufsize 500 maxblocks | 428842.2 | KBps | 30.0 s | 2 | 1655.0 | 2591.2 |
File Copy 4096 bufsize 8000 maxblocks | 2315055.5 | KBps | 30.0 s | 2 | 5800.0 | 3991.5 |
Pipe throughput | 4389021.4 | lps | 10.0 s | 7 | 12440.0 | 3528.2 |
Pipe-based Context Switching | 831655.8 | lps | 10.0 s | 7 | 4000.0 | 2079.1 |
Process Creation | 34789.6 | lps | 30.0 s | 2 | 126.0 | 2761.1 |
Shell Scripts (1 concurrent) | 11890.9 | lpm | 60.0 s | 2 | 42.4 | 2804.5 |
Shell Scripts (8 concurrent) | 1506.4 | lpm | 60.0 s | 2 | 6.0 | 2510.7 |
System call overhead | 5815793.6 | lps | 10.0 s | 7 | 15000.0 | 3877.2 |
System Benchmarks Index Score: | 3050.7 |
The increase was 18%, which in my opinion is quite noticeable. Why did the second test produce a slightly lower result? Most likely, the fact is that the test was performed on x86_64 and in the standard core there were more optimizations for the processor architecture than with the core compiled for the Pentium Pro on Intel Atom (SSE and others).
As you can see from this, there’s a point in building your kernel. The results are about the same on two fairly different processors: Intel Atom N270 and Core 2 Duo E8500.
I will not describe the kernel installation process for ARCH, it is as simple as possible. I am sure that for its users it is not difficult.
Build and install pf-kernel for Ubuntu
We download the kernel of our version with kernel.org. Attention: you need to download the version without stabilization patches (in the case of 2.6.38.11, you just need to download 2.6.38).
We download pf-kernel for this version of a kernel
from here .
Unpack the archives and install the patch.
patch -p1 <(pfkernel patch address)We copy the config in a folder with a kernel.
cp / boot / config-`uname -r` .configIf desired, you can make localmodconfig, which disables all unnecessary modules, this can greatly speed up the kernel build.
make localmodconfigif it scolds that there is no / sbin / lsmod
ln -s / bin / lsmod / sbin / lsmodConfigure the core
make menuconfigYou need to enable BFS, BFQ and tuxonice if you wish, as well as in the tab about the processor, you should choose the optimization for your processor.
We put a patch for kernels with kernel.org
sed -rie 's / echo "\ +" / # echo "\ +" /' scripts / setlocalversionClearing the directory
make-kpkg cleanWe collect
CONCURRENCY_LEVEL = `getconf _NPROCESSORS_ONLN` fakeroot make-kpkg --initrd --append-to-version = -pf kernel_image kernel_headersThat's all. We put the kernel with the dpkg command -i * .deb, reboot and select it in the bootloader.
UPDATE:Zen Kernel showed almost identical results, in some places a little better, but in general no more than 5%, and then brightened up without even completing all the tests (test time is about 40 minutes).
Somebody
Mr.z very strongly doubted the correctness of the calculations,
here in the table you can see the increase in indicators for each test, as well as the average increase, not just the increase in the index. The numbers came out almost completely the same.
For
IoGa ,
WiseLord and
gnomeby - Comparing the vanilla core with the vanilla core assembled for its architecture, if it showed a performance boost, it is no more than the level of error, almost no difference whatsoever.