📜 ⬆️ ⬇️

A bit about the performance of Cisco network equipment



This year we published two articles related to comparing the functionality of routers and firewalls of Cisco, as well as an overview of the separation of control and data plane in network equipment. In the comments to these articles, the issue of network equipment performance was raised. Namely, how the performance of Cisco routers of different generations depends on the inclusion of certain services on them. The topic of performance for Cisco ASA firewalls was also discussed. In this regard, there was a desire to look at these issues from a practical side, backing up certain moments with numbers. That will turn out and that turned out not so, I will tell under a cat.

By performance, we mean the capacity of the device, measured in Mbps. The test bench consisted of two laptops, with iPerf3 installed . The test method is quite simple. iPerf3 was launched in TCP packet transmission mode. Used 5 threads. I did not set a goal to determine the actual performance of devices. For this task, more complex equipment is needed, since it is necessary to recreate the traffic patterns of the real network. Yes, and it would be necessary to measure the number of processed packets. In our case, the main task was to assess the impact of using various services on the operation of the device, as well as comparing the results obtained on various devices. Thus, the selected toolkit at first glance seemed quite suitable for the tasks.
')
Cisco Integrated Services Router (ISR) Generation 1 and 2

For a start, two lower Cisco routers 871 and 881 were taken from the box. These are routers of different generations (871 older G1, and 881 newer G2), which are usually placed in small offices, for example, in remote branches of a company.

The studied routers have similar features in terms of software and hardware architecture: the operating system is Cisco IOS, the “brain” of the devices is SoC MPC 8272 at 871 and SoC MPC 8300 at 881.

For each router, the following modes of operation were checked:
* When testing, both static and dynamic NAT were configured. Both variants showed approximately the same effect on device performance.

Testing involved traffic routing (L3 switching) based on CEF and Process Switching. Both modes of operation on the studied devices are software processing packages. The difference is exactly how the router decides where to send the packet. In the case of Process Switching, the router for each packet determines where to transfer it and forms / modifies the necessary headers within a separate process based on the routing table and L2 tables. There is a so-called processor processing. In the case of CEF, the router uses FIB (prefix table) and Adjacency (neighbor data table) tables prepared in a special way, which can significantly reduce the load on the CPU and increase the packet processing speed inside the device.

For a more visual comparison, data on different devices are plotted on one graph (Figure 1).


Note the main points:
  1. Since the interfaces on the devices are of the FastEthernet type, the maximum throughput of point-to-point through iPerf3 did not exceed 95 Mbps. At the same time, CPU utilization for some modes of weaving did not reach its peak values, which means that the figure of 95 Mbit / s for these routers is not the limit.
  2. Router 881 looks better, because it has more advanced hardware stuffing (primarily a general purpose processor, then a CPU).
  3. As expected, we see a noticeable degradation of performance when services are enabled.
  4. When CEF is disabled, we have a significant decrease in performance, since the router does not begin to process each packet in the most optimal way.
  5. The inclusion of the log option in the ACL results in an increase in the load on the device (the CPU load in this case is 99%), which negatively affects the performance. This is due to the fact that the log option causes the router to process each packet that falls into the marked ACL line in Process Switching mode, which significantly increases the load on the processor.

I propose to consider in more detail CPU utilization in the case of routing in CEF mode and Process Switching. CEF routing:

Router881#sh processes cpu sorted CPU utilization for five seconds: 47%/42%; one minute: 40%; five minutes: 35% PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process 89 143724 8597 16717 1.51% 1.42% 1.43% 0 COLLECT STAT COU 5 25792 638 40426 1.43% 0.29% 0.20% 0 Check heaps 97 45204 180099 250 0.63% 0.57% 0.47% 0 Ethernet Msec Ti … 

Total CPU utilization is 47%. Of these, 42% goes to handling interrupts caused by packet transmission. Interrupts in the transmission of packets are of two types: interruption of receipt and interruption of transmission of the packet. A packet interruption is initiated by the interface processor when the packet is received through the router interface and is ready for processing. Upon receiving such an interrupt, the CPU stops processing current processes, and begins processing the packet. Since CEF is enabled, the CPU decides where to transfer the packet based on the CEF tables (FIB and Adjacency) during an interrupt. Those. it does not need to send a packet for processor processing, and thus significantly saves processor power. In this connection, only 5% of the CPU load is spent on the processes in the router. The interruption of sending a packet is transmitted to the CPU when the packet has been sent by the interface processor further along the communication channels. The CPU responds to this interruption by updating the counters and freeing up the memory allocated for storing the packet. In terms of contribution to the total device load, this interrupt is less interesting.

Routing in Process Switching mode:

 Router881#sh processes cpu sorted CPU utilization for five seconds: 99%/27%; one minute: 82%; five minutes: 48% PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process 129 98988 6013 16462 69.91% 55.95% 19.35% 0 IP Input 89 145568 9248 15740 1.11% 1.11% 1.33% 0 COLLECT STAT COU 97 45480 193804 234 0.23% 0.23% 0.35% 0 Ethernet Msec Ti … 

Now the total CPU utilization is 99%. And only 27% is spent on interruptions. The remaining 72% is spent on executing processes. IP Input process takes almost 70% of CPU time. It is this process that is responsible for processor processing of packets, i.e. those packets that cannot be processed during an interruption (for example, CEF is disabled or there is no necessary information in its tables for transmission, packets are addressed directly to the router or are broadcast traffic, etc.). And since in our example, CEF and Fast Switching are disabled (I did not mention this method due to its irrelevance), after the interruption of receiving a packet came to the CPU, the CPU sends the packet for processor processing. The interrupt is terminated and the CPU processes the packet directly within one of its processes. Therefore, we see this CPU utilization process IP Input.

It will also be interesting to look at the CPU load in the case of an ACL with the log option.

 Router881#sh processes cpu sorted CPU utilization for five seconds: 99%/37%; one minute: 80%; five minutes: 52% PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process 129 297672 15360 19379 60.83% 48.79% 29.67% 0 IP Input 89 150496 10973 13715 0.72% 0.93% 1.22% 0 COLLECT STAT COU 97 46036 232697 197 0.16% 0.17% 0.21% 0 Ethernet Msec Ti … 

The log option in the ACL causes the router to send each packet for processor processing, a sign of which, as in the previous example, is the high utilization of the CPU by the IP Input process.

Cisco ASA 5500

Let's look now at such a device as the Cisco ASA 5505 firewall. We can say that the ASA 5505 is similar to the Cisco 881 router in terms of positioning (for small offices and branch offices). These devices are from about the same price segment and have relatively similar hardware characteristics. The ASA 5505 uses an AMD Geode CPU with a clock speed of 500 MHz. The most important difference is the operational system. The ASA 5505 uses ASA OS. We talked about the differences between routers and ASA in terms of functionality in a separate article. Now let's look at the performance of the ASA and the impact on it of various services.

Since the ASA does not have pure routing and some dedicated technologies to optimize traffic routing, only the following modes of operation were tested:

For a more visual comparison, data on devices such as the ASA 5505 and the router 881 are plotted on one graph (Figure 2).


The diagram shows that the throughput of the ASA 5505 in all modes of operation is limited only by the technical aspects of the stand. Moreover, if we look at the CPU load, then for all the options it is almost identical:

 cbs-asa-vpn# sh proc cpu-usage non-zero sorted PC Thread 5Sec 1Min 5Min Process 0x082a2849 0xa86e0994 31.1% 25.4% 13.4% Dispatch Unit 0x09bcebdb 0xa86d094c 6.4% 5.1% 5.9% esw_stats 0x08e68295 0xa86ced10 0.2% 0.1% 0.2% ci/console 0x0919171d 0xa86c9404 0.2% 0.2% 0.2% IP SLA Mon Event Processor 0x08f0591c 0xa86ce68c 0.1% 0.1% 0.1% update_cpu_usage 

The following conclusions can be drawn:
  1. With relatively similar price and hardware parameters, the ASA 5505 provides better performance than the 881 router.
  2. The performance of the ASA is practically independent of the services (at least in the framework of this stand it was not possible to identify it).
  3. The logging (log) option in the ACL does not degrade performance. This is due to the specifics of the implementation of the routing function in the device.

Thus, the ASA OS operating system seems more balanced in terms of the impact of services on device performance.

Cisco ISR 4000

Go ahead. I propose to see how things are with the impact of services on the performance of Cisco routers ISR 4000. This is the newest line of Cisco routers for small and medium installations. As we remember, these routers use the Cisco IOS XE operating system, which can work in multi-threaded mode. From the point of view of the hardware stuffing, these multicore processors are used in these routers.

And so we get out of the box the youngest Cisco ISR 4000 - 4321. We activate the performance license on it to get the declared maximum performance of 100 Mbps, and we start testing. It is important to note that the ISR 4000 routers always use a shaper, which limits the maximum performance of the device. Two thresholds are used: basic (for 4321, this is 50 Mbps) and extended (for 4321, this is 100 Mbps; activated by the performance license) of performance. Such a scheme of work is aimed at obtaining the predicted values ​​of the device performance, not allowing the “flooding” of a large amount of traffic.

First, we check the performance of pure routing in CEF mode without additional services. Launch iPerf3 and get 95 Mbps. Expected. Look at this point at the CPU load:

 cbs-rtr-4321#show proc cpu sorted CPU utilization for five seconds: 1%/0%; one minute: 1%; five minutes: 1% PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process 658 8563421 409607083 20 0.47% 0.48% 0.48% 0 IP SLAs XOS Even 79 1123726 12408975 90 0.15% 0.06% 0.07% 0 IOSD ipc task 2 120745 326115 370 0.07% 0.00% 0.00% 0 Load Meter 667 420 1850 227 0.07% 0.03% 0.04% 2 SSH Process … 

Here is the result! CPU load 1%. Cool! But not everything is perfect. Understanding of this phenomenon comes after a more detailed study of the specifics of the IOS XE.

IOS XE is an operating system created on the basis of Linux, carefully doped and optimized by the vendor. The traditional Cisco IOS operating system runs as a separate Linux process (IOSd). The most interesting thing is that in IOS XE we have a separate main process that performs the functions of the data plane. Those. we have a clear separation of control and data plane at the program level. The process responsible for the control plane is called linux_iosd-imag. This is what IOS is used to. The process responsible for the data plane is called qfp-ucode-utah. QFP, familiar? Immediately we recall the QuantumFlow Processor network processor in the ASR 1000 routers. Since IOS XE originally appeared on these routers, the process responsible for the transmission of packets received the abbreviation qfp in its name. Subsequently, for the ISR 4000, apparently, they did not change anything, with only one difference: in the ISR 4000, the QFP is virtual (performed on separate cores of a general-purpose processor). In addition to the voiced processes, there are other auxiliary processes in IOS XE.

Thus, in order to see how much CPU power is loaded, we analyze the output of the following commands specific to IOS XE:

 cbs-rtr-4321#show platform software status control-processor brief Load Average Slot Status 1-Min 5-Min 15-Min RP0 Healthy 1.14 1.05 1.01 Memory (kB) Slot Status Total Used (Pct) Free (Pct) Committed (Pct) RP0 Healthy 3950540 3888836 (98%) 61704 ( 2%) 2517892 (64%) CPU Utilization Slot CPU User System Nice Idle IRQ SIRQ IOwait RP0 0 5.28 10.57 0.00 79.84 4.19 0.09 0.00 1 1.80 1.60 0.00 95.99 0.50 0.10 0.00 2 41.00 2.70 0.00 56.30 0.00 0.00 0.00 3 23.02 76.97 0.00 0.00 0.00 0.00 0.00 

Our router uses four cores (CPU 0, 1, 2, and 3). The team allows us to get information on downloading each of them.

Note

You can see the hardware stuffing of the router by outputting standard Linux information from the dmesg file: more flash: / tracelogs / dmesg.

The ISR 4321 router uses a processor:
CPU0: Intel® Atom (TM) CPU C2558 @ 2.40GHz stepping 08

The following command allows us to see the utilization of processor capacity by various processes:

 cbs-rtr-4321#show platform software process slot RP active monitor cycles 1 interval 1 top - 15:03:45 up 18 days, 21:00, 0 users, load average: 1.13, 1.05, 1.01 Tasks: 316 total, 2 running, 314 sleeping, 0 stopped, 0 zombie Cpu(s): 8.8%us, 22.3%sy, 0.0%ni, 68.8%id, 0.0%wa, 0.1%hi, 0.0%si, 0.0%st Mem: 3950540k total, 3889372k used, 61168k free, 199752k buffers Swap: 0k total, 0k used, 0k free, 1608388k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3111 root 20 0 1041m 589m 333m S 150 15.3 28747:48 qfp-ucode-utah 1915 root 20 0 1957m 182m 124m S 10 4.7 2216:08 fman_fp_image 22575 root 20 0 360m 74m 30m S 2 1.9 392:16.70 bsm 23130 root 20 0 46828 25m 11m S 2 0.7 23:08.43 cmand 26108 root 20 0 2378m 896m 374m S 2 23.2 881:05.01 linux_iosd-imag 27088 root 20 0 2204 1096 728 R 2 0.0 0:00.02 top 1 root 20 0 1820 520 440 S 0 0.0 0:10.97 init 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd … 

In this example, IOS eats only 2%, and QFP - 150% (which is equivalent to disposing of one core completely and one half more).

So what does the show processes cpu show then? It displays the virtual CPU load that was allocated to the IOSd process. Under this process, one of the CPU cores is allocated on the ISR 4000 routers.

From all this, we can conclude that in IOS XE, the packet processing architecture has changed significantly compared with conventional IOS. IOS no longer handles absolutely all packages. This process processes only those packages that require processor processing. But even in this case, IOS XE uses a newer Fastpath mechanism that implements packet transfer for processor processing via a separate stream within IOSd, rather than through interrupts. Interrupts in IOSd occur only when processing via Fastpath is not possible.

Let's return to our task. Check the following modes:

It should be noted that it is impossible to disable CEF on 4321 (and indeed on the entire ISR 4000 line). Now this is the basic routing technology.

The test results are presented in Figure 3. For greater clarity, the bandwidth values ​​are plotted (and they are the same in all cases) and the QFP CPU load. The IOSd process is not interesting due to the fact that in all modes the virtual CPU utilization inside the IOSd is minimal - 1%.


During the testing, it was not possible to identify the dependence of the ISR 4321 router performance on the inclusion of services. There is a slight increase in CPU utilization, but not much. It is also worth noting that the inclusion of the log option in the ACL no longer leads to dramatic performance losses, since the packet is not sent for processor processing.

Results

Using the example of several devices of different generations and types, we tried to consider how performance depends on the inclusion of various services. In general, the results obtained fit into previously known facts. America, we have not opened. Brief conclusions obtained as a result of testing can be formulated as follows:

  1. There is a significant degradation of the performance of the ISR G1 and G2 routers when services are enabled.
  2. ASA performance is less affected by services. At a comparable price with a router, we get great performance.
  3. The impact of enabling services on ISR 4000 performance is minimal.

Thanks for attention. I hope that some information from the article will help in working with Cisco equipment.

Source: https://habr.com/ru/post/303770/


All Articles