
We have already come to terms with the fact that the increase in the clock frequency of the processors has stopped and the manufacturers have taken the path of parallelizing the calculations. However, the number of cores of a typical general-purpose processor, quickly overcoming grades 2 and 4, stopped at about 8. Some even got together to bury Moore's law.
Such stagnation has an objective reason. If the difference between 2, 4 or 8 cores is rather quantitative, then the 16-core processor is faced with the fundamental limitations of traditional architecture. The fact is that over the past few decades, the
bus has served as the basis for communication between the individual
IP blocks of the chip. While there were few blocks, she coped, but when the cores began to proliferate, this architecture has exhausted itself. The bus is a common data transmission medium to which several processor units are connected. At each time one block can transmit data, and all the rest - to receive. If several blocks need to be transmitted simultaneously, a collision arises, and therefore a delay. When the number of cores is more than eight, the delays become unacceptably large, almost completely crossing the advantages of parallel operation of several cores.
The number of cores can be increased a little more by dividing the bus into several segments connected by bridges, but this is more likely a “crutch” that doesn’t scale well and doesn’t solve the main problem. This solution, which will allow you to combine hundreds of units on a single chip, is a well-known packet-switched network, or
Network on Chip .

The transition from the bus to the network is quite natural. This is how telecommunication networks developed: radio air — a typical “bus”; telephone networks — circuit switching using matrix switches; Internet packet switching. Computer peripherals have also developed - a modern PCI Express bus is actually not a bus at all, but a network with a star-type topology. Processors are also developing - first, direct connections between blocks, then buses and matrix switches, and finally, networks.
')
In the NoC architecture, each core or processor unit is connected to a router, through which it communicates with other units. The routers themselves are networked in which data packets travel from one unit to another, as well as packets in a regular computer network. This greatly simplifies the topology of the chip and removes the limitations on scaling - unlike the bus, many blocks are able to communicate simultaneously without interfering with each other. Computer simulation and prototypes of multi-core processors show that with a large number of cores, this architecture is superior to the traditional one in many respects.
Naturally, it would be unwise and inefficient to directly transfer the logic and protocols of the Internet operation inside the chip. There are completely different technological limitations and tasks:
- Very stringent requirements for delays and power consumption. Switches should work with nanosecond delays and be very economical. Energy costs for data transfer between the blocks constitute a significant part of the total consumption of modern chips.
- Simplicity and minimalism. Switches on a chip should take up little space, which means they cannot have complex logic and large buffer size.
- Parallel, not serial connection. At the physical level inside the chip, it is more profitable to transfer the bits not consecutively through one conductor, but along 32 or 64 parallel channels.
Research NoC engaged in leading companies and universities in the world. So, in 2007, Intel
developed an experimental processor with 80 cores and a performance of 1 teraflops with a total power consumption of 62 watts. In 2010, the 48-core
Single cloud cloud computer was introduced.
In April of this year, the
work of a group of scientists MIT was published, which created a prototype of a 16-core processor, in which specific NoC-based optimization was used - virtual bypassing and low-amplitude signals. These technologies allowed to approach the theoretical limits of throughput and delays and significantly reduce energy consumption.
How do they work? A regular router saves the received packet to a buffer, analyzes its header and decides where to send it next. Virtual bypassing allows you to send a packet with almost no delay, due to the fact that the header is sent in advance, and the switch manages to make the necessary switching circuits by the time the packet body arrives. Thus, the packet goes non-stop, bypassing the buffer. Low-swing signaling is the reduction of the difference between voltages 0 and 1 in a conductor, due to which it was possible to further reduce power consumption. In sum, these improvements raise the throughput and efficiency more than one and a half times.
In addition to improving features such as power consumption and speed, the NoC architecture provides another important advantage. It makes it easy to combine not only homogeneous cores, but generally any blocks on a single chip. As in computer networks, the physical and transport layers work the same for all types of data and protocols. You can easily put in place of one or more of the universal computing cores any other IP block, for example, a graphics core, a specialized signal processor or a controller of any device. And, just like in networks, you can implement support for Quality of Service at the chip level, which can be useful for real-time systems and virtualization.
NoC for combining processor cores still has experimental status, but for combining dissimilar blocks in systems on a NoC chip, they have been developed and applied for quite some time. Solutions from companies like
Sonics or
Arteris are used in Samsung, Qualcomm and even Intel chips. It is possible that the network architecture will soon begin to force out the tires from multi-core central processors from the “holy of holies” microelectronics. And then the number of cores will begin to grow rapidly again. So the law of Moore is still too early to bury.
Additional sources on the topic:
- Intel presentation
- List of NoC research groups
- Tilera Company - developer of processors on the architecture of NoC
- Comparison of NoC and bus architecture
- NoC overview lecture