📜 ⬆️ ⬇️

Comparing tick-to-trade delays with CEPappliance and Solarflare TCPDirect

In this article, we present the values ​​of delays measured for two types of environments — a device based on the FPGA CEPappliance (“piece of hardware ”) and a computer with the Solarflare network board in TCPDirect mode, we describe how we obtained these measurements — we describe the measurement technique and its technical implementation. At the end of the article there is a link to GitHub with the obtained results and some sources.

It seems to us that our results may be of interest to high-frequency traders, algorithmic traders and all those who are not indifferent to data processing with low latency.

Measurement technique: what and how we measured


The scheme of the measuring stand looks like this:

The scheme of the measuring stand

')
SUT (System Under Test) is either a CEPappliance or a server with Solarflare (see below for characteristics of the systems under test).

CEPappliance and Solarflare have a common scope of application - high-frequency and algorithmic trading. Therefore, we took as a basis the scenario from this area, measuring the amount of delay from the moment the test driver sent the last byte of the packet with market data (tick) until it received the first byte of the packet with the application (trade) to the exchange (the MAC delay and PHY driver levels are the same for both test environments and subtracted from the resulting values ​​below) - the so-called tick-to-trade delay. Measuring the time from the moment the driver sends the last byte, we eliminate the influence of the speed of data transmission / reception, which depends on the physical layer.

You can measure the delay using another method, such as the time from the moment the driver sends the first byte to the moment it receives the first byte from the system being measured. Such a delay will be longer and can be calculated on the basis of our measurements using the formula:

latency 1-1 = latency N-1 + 6.4 * int ((N + 7) / 8) ,

where latency N-1 is the delay we measured (from the moment the driver sent the last byte to the moment it received the first byte), N is the Ethernet frame length in bytes, int (x) is the conversion to the integer, dropping the fractional part of a real number.

Here is the processing scheme, the execution time of which is the delay of interest:

Processing scheme

What are the stages of testing?

Training:


Testing:


Processing test results:


Stand for Solarflare


The SUT is a server with an Asus P9X79 WS motherboard, an Intel Core i7-3930K CPU @ 3.20GHz processor and an SFN8522-R2 Flareon Ultra 8000 Series 10G Adapter, which supports TCPDirect.

For this booth, a C-program was written that receives UDP packets through the Solarflare TCPDirect API, parses them, builds the order book, generates and sends a purchase message using the FIX protocol.

Parsing a message, building a glass, forming a message with an application is coded “hard” without the support of any variations and checks in order to ensure minimum delay. The code is available on GitHub .

Stand for “hardware” CEPappliance


The SUT is the CEP appliance, or “piece of hardware,” as we call it, the DE5-Net board with an Altera Stratix V FPGA chip, inserted into the server's PCIe slot, through which it receives power and nothing else. Management and data exchange with the board is carried out via a 10G Ethernet connection.

We have already told that our firmware for the FPGA chip contains many different components, including everything necessary to implement the test script described here.

The script program for the CEP appliance is contained in two files. In one file , a data processing logic program, which we call a schema. In another file, the description of adapters through which the circuit (or the piece of hardware that executes it) interacts with the outside world. Just like that!

For CEPappliance, we implemented two versions of the scheme and made measurements for each version. In one version (CEP appliance ALU), the logic is implemented in the embedded high-level language (see lines 47–67 ). In the other (CEPappliance WIRE) - on Verilog (see lines 47-54 ).

results


Measured tick-to-trade delays in nanoseconds:
SUTminavgmaxstddev95%97%99%99.9%
Solarflare TCPDirect1411163726381502022211623032619
CEPappliance ALU105011251620451251132014151549
CEPappliance WIRE5616401163457688259071087

Measurement results


findings


The miracle did not happen and the hardware implemented on the basis of FPGA turned out to be faster than the solution based on the server with Solarflare TCPDirect. The higher the percentile, the more noticeable the difference in speed. At the same time, the speed of the solution at the CEPappliance has a dispersion an order of magnitude lower.

The option for CEPappliance, when the data processing logic is implemented on Verilog, is 60-70% faster than implementing the same algorithm in the embedded CEPappliance language.

Source


We have placed almost all of the source code that participated in the testing, open on GitHub in this repository .

Only the test driver code was left closed, since there is a hope to monetize it. After all, it allows you to very accurately measure the reaction rate of the system. And without this information to make high-quality HFT-solution is almost impossible.

What's next?


It would be logical to find out whether the identified difference in the delays of various solutions is important, for example, when trading on the Moscow Stock Exchange. This will be in the next article. But looking ahead, let's say that even half a microsecond matters!

Source: https://habr.com/ru/post/339702/


All Articles