📜 ⬆️ ⬇️

Cisco unveiled features of the 400-gigabit NPU

At Hot Chips, held in August of this year, Cisco Lead Engineer Jamie Markevitch spoke about the features of the 400 Gb / s network processor that is currently being delivered to customers.


/ Flickr / timothy lorens / cc

The chip is made on a 22-nanometer process technology and has 672 cores, each of which processes up to four threads. Network Processor (NPU) contains 9.2 billion transistors and 353 MB of SRAM memory. SRAM plays the role of the L0 cache, which stores instructions and data for each stream. There is also a L1 cache for a cluster of 16 cores.
')
The NPU has 42 core clusters that are connected to the L2 instruction cache by the L2 command cache. It also combines caches of different levels, data package storage, accelerators, built-in and dynamic memory into a single “network”. This network operates at a frequency of 1 GHz and has a bandwidth of more than 9 TB / s.


Chip block diagram

Cisco did not talk about the instruction set that is used in the NPU. However, the experts made the assumption that this is a custom set designed specifically for work with the network, and not ARM, MIPS, Power or X86.

NPU cores provide packet data processing throughout its “life” in the chip. This eliminates idle or "juggling" packets between cores. Therefore, 2688 packets can be processed simultaneously. Packages are stored off-chip in DRAM, but processed in real time in SRAM. Moreover, accelerators can access DRAM copies independently of the cores that work with the SRAM original.

Since different packages require different characteristics, all cores differ in performance to ensure maximum efficiency. At the same time, Cisco NPU supports the usual programming methods - C or assembler.

The network processor processes packets at 800 Gb / s, or 400 Gb / s in full duplex mode. In turn, the throughput of the SERDES interface is 6.5 TB / s. Most connections are used to connect DRAM and TCAM — the latter stores access lists (ACLs). It is also used to buffer packets, which is why it is sometimes not enough - then part of the data is stored in DRAM.

Most of the NPU logic operates at 760 MHz or 1 GHz. MAC interfaces support port operation at speeds from 10 to 100 Gb / s.

The network processor is equipped with an integrated traffic manager, which manages 256 thousand requests at the same time and can withstand a load of half a trillion objects. Accelerators take on the handling of IPv4 and IPv6 prefixes, compression and hashing of IP ranges, packet delivery, statistics collection.

External DRAM has 28 SERDES lines that operate at 12.5 Gb / s. SERDES uses a proprietary serial protocol for memory access — it can hold up to a billion random hits per second and supports data transfer at speeds up to 300 Gb / s.

The logic is connected to DRAM through a parallel input / output interface — it has a maximum speed of 1250 Mb / s. Interestingly, according to the 22-nanometer process technology, only the processor is executed. DRAM is made according to the 30 nm process technology, while SERDES and BIST are 28 nm each.

“We determined which operations are usually carried out on such devices, and optimized the chip to work with random operations at high speed. It can be used as a buffer, in which the number of readings will be equal to the number of records, as well as to search for data in the databases when the number of updates is not so large, ”said Jamie Markevitch, chief engineer of Cisco.

Demonstration of the "internals" of a network processor is not a unique, but rare phenomenon. Manufacturers usually do not disclose such information, although exceptions do occur. In January, Barefoot Networks spoke about the features of the Tofino chip, Innovium in March talked about Teralynx and Mellanox Technologies in July about Spectrum-2 .

About Hot Chips

Hot Chips is a high-performance processor symposium. For the first time it took place in 1989. This year, in addition to Cisco, many major manufacturers attended the event. In particular, Microsoft presented their achievements in the field of augmented reality and talked about the processor for the Xbox One X Scorpio. The presentation of the Chinese company Baidu was devoted to augmented reality, and a Google representative spoke about optimizing hardware for neural networks.

PS What else do we write in our blog:

Source: https://habr.com/ru/post/338620/


All Articles