📜 ⬆️ ⬇️

Squeezing gigabits: or undocumented ViPNet Client / Coordinator feature

Hello. This post on how a spontaneous experiment and a few hours spent in the server helped to get interesting results on the performance of the ViPNet Custom 4.3 solution in the Windows version.



Are you already interested? Then look under the cat.
')
1. Introduction

Initially, this testing was just an internal experiment of its own. But since, in our opinion, the results were quite interesting, we decided to publish the findings.

What is ViPNet Custom software? In short, this is a firewall and encryption complex developed by InfoTeX in accordance with the requirements of the Federal Security Service of Russia and the Federal Service for Technical and Export Development of Russia.

At the time of testing the solution, there were no data on measuring the performance of the software version of ViPNet Custom on various configurations. It was interesting to compare the results of the software implementation of ViPNet Custom with the hardware implementation of ViPNet Coordinator HW1000 / HW2000, the performance of which is known and documented by the vendor.

2. Description of the original stand

According to the Infotecs.ru website, InfoTeX OJSC, the most powerful configurations of the HW platforms have the following characteristics:

Type ofPlatformCPUBandwidth
HW1000 Q2AquaServer T40 S44Intel Core i5-750Up to 280 Mb / s
HW2000 Q3AquaServer T50 D14Intel Xeon E5-2620 v2Up to 2.7 Gb / s


Unfortunately, no additional information about the testing conditions was provided.

Initially, we gained access to equipment with the following characteristics:
1) IBM system x3550 M4 2 x E5-2960v2 10 core 64 GB RAM;
2) Fujitsu TX140 S1 E3-1230v2 4 core 16GB RAM.

Using the Fujitsu server in the experiment was questioned ... The desire to immediately storm 10GbE was prevented by the absence of a network on the equipment, so we decided to leave it as the starting starting point.

At IBM, we organized 2 virtual servers with a virtual 10 Gigabit switch based on the ESX 6.0u1b platform and later we evaluated the overall performance of the two virtual machines.

Stand description:

1. Server IBM system x3550 M4 2 x E5-2960v2 10 core 64 GB RAM, ESXi 6.0u1.
For each virtual machine (VM), one physical processor with 10 cores is allocated.

VM1: Windows 2012 R2 (VipNet Coordinator 4.3_ (1.33043)):
• 1 CPU 10 core;
• 8GB RAM.
VM2: Windows 8.1 (ViPNet Client 4.3_ (1.33043)):
• 1 CPU 10 core;
• 8GB RAM.

VMs are connected to a virtual switch 10Gbps, MTU 9000 is installed.

2. Fujitsu TX140 S1 E3-1230v2 4 core server 16Gb RAM, Windows 2012 R2, ViPNet Client 4.3_ (1.33043).

IBM and Fujitsu physical servers are connected by a gigabit network with MTU 9000. Hyper Threading is disabled on both servers. Iperf3 was used as load software.

The layout of the stand is shown in Figure 1.



Figure 1 - Scheme of the organization of the test bench

3. The first stage of testing

Unfortunately, the preparation of this article began after all the tests were conducted, therefore, for this section, screenshots of the results, except the final one, have not been preserved.

3.1. Test number 1

First, we estimate whether we can provide network bandwidth of 1 Gb / s. To do this, load the virtual coordinator VM1 Windows 2012 R2 (ViPNet Coordinator 4.3) and the physical server Fujitsu TX140 S1.

On the VM1 side, Iperf3 is running in server mode.
On the server side, Fujitsu Iperf3 is launched with parameters

Iperf.exe –c IP_server –P4 –t 100,

where the –P4 parameter indicates the number of threads on the server, equal to the number of cores.

The test was conducted three times. The results are shown in table 1.

Table 1. Test result №1

HostCPU loadAchieved loadChannel
VM1 Windows 2012 R2 (ViPNet Coordinator 4.3)<25%972 Mbps1Gbps
Fujitsu TX140 S1100%972 Mbps1Gbps


Based on the results, the following conclusions were made:
1) E3-1230v2 processor in the encryption task can provide 1Gb / s network bandwidth;
2) the virtual coordinator is loaded less than 25%;
3) with a similar processor, the official performance of the ViPNet Coordinator HW1000 is exceeded almost 4 times.

If we proceed from the data, it is clear that the Fujitsu TX140 S1 server has reached its maximum performance. Therefore, further testing will be carried out only with virtual machines.

3.2. Test number 2

So it's high speed. Let's test VM1 Windows 2012 R2 Virtual Coordinator (ViPNet Coordinator 4.3) and Windows 8.1 VM2 (ViPNet Client 4.3).

On the VM1 side, Iperf3 is running in server mode.
Server-side VM2 Iperf3 launched with parameters

Iperf.exe –c IP_server –P10 –t 100,

where the –P10 parameter indicates the number of threads on the server, equal to the number of cores.

The test was conducted three times. The results are shown in table 2.

Table 2. Test Result # 2

HostCPU loadAchieved loadChannel
VM1 Windows 2012 R2 (ViPNet Coordinator 4.3)25-30%1.12 Gbps10 Gbps
VM2 Windows 8.1 (ViPNet Client 4.3)25-30%1.12 Gbps10 Gbps


As you can see, the results are not much different from previous ones. The test was performed several times with the following changes:
• iperf server part is migrated to VM2;
• Replaced guest OS on VM2 on Windows Server 2012 R2 with ViPNet Coordinator 4.3;

In all tested combinations, the results remained the same within the error.
There is an understanding that, most likely, there is a built-in limitation in the ViPNet software itself.

After several testing options, it turned out that when Iperf3 was started with parameters
Iperf.exe –c IP_server –P4 –t 100
bandwidth has become almost the same as previously achieved on the Fujitsu server.

At the same time, 4 cores of the processor became maximally loaded - exactly 25% of its capacity.

The results obtained finally convinced that the restriction is present. The results are sent to the manufacturer with a request for solutions to the problem.

4. Continuation of the experiment

Soon the answer was received from the manufacturer:
“The number of processors can be controlled by the value of the HKLM \ System \ CurrentControlSet \ Control \ Infotecs \ Iplir key, in which the value is ThreadCount. If the value is -1 or not specified, then the number of threads is chosen equal to the number of processors, but no more than 4. If a value is set, then the number of threads is chosen equal to this value. "

Well, the guess was true. We set up the bench for maximum performance by setting the value of the ThreadCount parameter to 10 on both virtual machines.

4.1. Test number 3

After making all the necessary changes, run Iperf again.

On the VM1 side, Iperf3 is running in server mode. Server-side VM2 Iperf3 launched with parameters
Iperf.exe –c IP_server –P10 –t 100,

where the –P10 parameter indicates the number of threads on the server, equal to the number of cores.

The test was conducted three times. The results are shown in Table 3 and Figures 2-3.

Table 3. Test result number 3

HostCPU loadAchieved loadChannel
VM1 Windows 2012 R2 (ViPNet Coordinator 4.3)100%2.47 Gbps10 Gbps
VM2 Windows 8.1 (ViPNet Client 4.3)100%2.47 Gbps10 Gbps




Figure 2 - iPerf3 output on VM1 Windows 2012 R2 (ViPNet Coordinator 4.3)



Figure 3 - iPerf3 output on VM2 Windows 8.1 (ViPNet Client 4.3)

Based on the results, the following conclusions were made:
1) the changes made it possible to achieve maximum encryption performance with full utilization of the processor;
2) the total performance when using two Xeon E5-2960v2 can be considered equal to 5 Gbps;
3) when taking into account the overall performance of the two processors, the resulting encryption performance doubles the official results of the ViPNet Coordinator HW2000.

The results obtained only fueled interest, which can be obtained even more. Fortunately, it turned out to get access to more powerful equipment.

It is also worth noting that during testing there was no difference in throughput between ViPNet Client and ViPNet Coordinator.

5. The second stage of testing

For further research on the performance of the software part of the ViPNet software, we got access to two separate blade servers with the following characteristics:
• CPU 2 x E5-2690v2 10 core;
• ESXi 6.0u1.

Each virtual machine is located on its own separate "blade".

VM1: Windows 2012 R2 (ViPNet Client 4.3_ (1.33043)):
• 2 CPU 20 core;
• 32 GB RAM.

VM2: Windows 2012 R2 (ViPNet Client 4.3_ (1.33043)):
• 2 CPU 20 core;
• 32 GB RAM.

The network connection between the virtual machines is done via the blade server with a bandwidth of 10 Gbps with MTU 9000. Hyper Threading is disabled on both servers.

To simulate the load, iPerf3 software was used and, in addition, Ntttcp with the following main parameters:

1) on the receiving side:

Iperf.exe –s;

on the transmission side:

Iperf.exe –cserver_ip –P20 –t100;

2) on the receiving side:

NTttcp.exe -r -wu 5 -cd 5 -m 20, *, self_ip -l 64k -t 60 -sb 128k -rb 128k;

on the transmission side:

NTttcp.exe -s -wu 5 -cd 5 -m 20, *, server_ip -l 64k -t 60 -sb 128k -rb 128k.

The organization of the stand is shown in Figure 4.



Figure 4 - Scheme of the test bench organization

5.1. Test number 4

To begin, let's check the network bandwidth without encryption. ViPNet software is not installed.

The test was conducted three times. The results are shown in Table 4 and Figures 5-6.

Table 4. Test result # 4

HostCPU loadAchieved loadChannel
Ntttcp2.5%8.5 gbps10 Gbps
Iperffour%9.3 Gbps10 Gbps




Figure 5 - Ntttcp test result without encryption



Figure 6 - Iperf test result without encryption

Based on the results, the following conclusions were made:
1) network throughput of 10 Gbps was achieved;
2) there is a difference in the results of software for testing. Further, for reliability, the results will be published for both Iperf and Ntttcp.

5.2. Test number 5

Set the value of the ThreadCount parameter to 20 on both virtual machines and measure the results.

The test was conducted three times. The results are shown in Table 5 and Figures 7-8.

Table 5 - Test result №5

HostCPU loadAchieved loadChannel
Ntttcp74-76%3.24 Gbps10 Gbps
Iperf68-71%3.36 Gbps10 Gbps




Figure 7 - Ntttcp test result with encryption



Figure 8 - Iperf test result with encryption

Based on the results, the following conclusions were made:

1) on a single server, the theoretical performance of the ViPNet Coordinator HW2000 was exceeded;
2) theoretical performance of 5 Gbps was not achieved;
3) CPU load did not reach 100%;
4) there is a difference in the results of the software for testing, but, at the moment, it is minimal.

Considering that the processors not on the servers were not fully utilized, let us pay attention to the restriction on the part of the ViPNet driver to the simultaneous number of encryption flows.

To check the constraints, let's look at loading one processor core during an encryption operation.

Test number 6

In this test, we will use only Iperf, since, according to the results of previous tests, it gives a heavy load to the processor, with parameters

Iperf.exe –cIP_server –P1 –t100.

On each server, through the registry, we restrict ViPNet to use a single kernel during an encryption operation.

The test was conducted three times. The results are shown in Figures 9-10.



Figure 9 - Disposal of a single core during an encryption operation



Figure 10 - Iperf test result with encryption when loading a single core

Based on the results, the following conclusions were made:

1) a single core was loaded at 100%;
2) based on the load of one core, 20 cores should give a theoretical performance of 5.25 Gbps;
3) based on the load of one core, the ViPNet software has a limit of 14 cores.

To check the kernel limits, we will conduct another test with 14 cores enabled on the processor.

Test number 7

Encryption testing with 14 cores.

In this test, we will only use Iperf with parameters

Iperf.exe –cIP_server –P12 –t100.

On each server through the registry we restrict ViPNet to use no more than 14 cores during an encryption operation.



Figure 11 - Disposal of 14 cores during encryption operation



Figure 12 - Iperf test result with encryption when loading 14 cores

Based on the results, conclusions were made:

1) all 14 cores were loaded;
2) performance is similar to the version with 20 cores;
3) there is a limit of 14 cores with a multi-threaded encryption operation.

The test results were sent to the manufacturer with a request for solutions to the problem.

6. Conclusion

After some time, the manufacturer's response was received:

"Using more than four cores on the Windows coordinator is an undocumented feature, and its correct operation is not tested and not guaranteed."

I think this test can be finished.

Why is the software version quite a breakthrough in the test results?

Most likely there are several reasons:
1) Old test results. On the new firmware, according to unofficial data, the performance of HW has greatly increased;
2) Undocumented testing conditions.
3) Do not forget that for obtaining the maximum result an undocumented opportunity was used.

Any questions? Ask them in the comments.

Andrey Kurtasanov, Softline

Source: https://habr.com/ru/post/301810/


All Articles