For 2 years we have been in close contact with HFT traders and developers of solutions for HFT trading. And we feel some embarrassment at the fact that no one in this environment openly talks about their technological successes. Since we are making
FEPGA -based
CEPappliance devices, including those applicable to HFT trading, we are tirelessly interested in who and how uses FPGAs in this area. One gets the obsessive impression that FPGA in HFT trading, like sex in adolescents - everyone talks about them, but few of them do it, moreover, successfully.
In fact, FPGA gives a noticeable advantage over all other technologies (except, perhaps, specialized ASIC) in HFT and algorithmic trading. Our tests show that an application that arrives on the Moscow Exchange 500 nanoseconds faster than others will be executed first with a probability of 75%. That is, if an automated trading system accepts a package for 500 nanoseconds faster (FAST, FIX or TWIME), parses it, updates the order book (“glass”), “understands” what to do (create / move / cancel an order), forms a package with the application (FIX or TWIME) and sends it to the exchange, then its application will be executed before the others in 75% percent of cases.
Our other tests show that using advanced network cards and a number of tricks on the CPU can get a tick-to-trade delay of 2-4 microseconds for percentiles of 97% or more. Is it possible to get a delay of less than 1-1.5 microseconds to be faster than the vast majority of HFT traders?
Today, only FPGA
1 can provide such a delay. And those who can benefit from this - HFT traders using FPGA - are in no hurry to tell their comrades in the shop about it. This makes it difficult to assess our own decision, its positioning relative to competitors.
')
In this article we describe in detail the possibilities of CEPappliance as applied to HFT trade. Maybe someone else will have the courage to talk about his decision ...
Main features of the HFT system
To ensure a low delay in the response of the system to a signal from the exchange, you need to “find out” as soon as possible about the appearance of this signal. To do this, the HFT system must:
- “Listen” to a variety of different market data streams in which the desired signal can be transmitted, for example, order streams, transactions, statistics, etc. - any of these flows may be faster than others;
- to reconcile several (2 or 4) feeds within one data stream - any of these feeds may be faster than others, but data should not be duplicated;
- filter flows by criteria values ​​that can be dynamically changed;
- to build a “glass” according to data from 2 flows, for example, orders and deals — any of these flows can be faster than another.
In CEPappliance, all this is implemented directly in the FPGA, including parsing of messages in the FAST, FIX and TWIME (FIX SBE) formats, in which the signals from the exchange are transmitted.
After receiving the signal, the trading strategy should quickly form a reaction. In CEPappliance, the algorithm of the strategy can be implemented both directly in the FPGA and in the high-level language HLL, the programs on which, after compilation, are executed by original processors of their own design, placed on the same FPGA chip.
The functions of monitoring the operation of the robot and its monitoring stand alone: ​​tracking the state of the robot, its calculations, changing the strategy parameters, etc.
In CEPappliance for this there is
- logging;
- mirroring the entire (!) exchange between the hardware and the exchange without any changes on a separate SFP + port on the FPGA board;
- binary adapters for receiving / transmitting strategy parameter values ​​and the possibility of their processing using HLL, which will significantly speed up and reduce costs compared to the implementation of this processing in the hardware description language, for example, Verilog;
- receiving events from FIX and TWIME adapters about setting / breaking sessions to monitor the state in which the robot can work normally, having all the necessary connections.
Architecture
CEPappliance is an Altera Stratix V FPGA board that fits into a PCI server slot (1U height is enough). The interaction with the outside world is carried out via 10Gb Ethernet via SFP + ports or PCIe.
All components are implemented directly on the chip and “sharpened” for a minimum delay. The firmware of the CEP appliance contains all possible components. And their inclusion in the chain of event processing is carried out from the host using a special configurator program. At first the piece of hardware is stitched, then it is launched, and then it is configured. The configurator reads an event-handling program (or circuit), compiles it into an internal FPGA memory card, and loads it onto a CEPappliance via TCP. After loading the configuration of the CEP appliance, the adapters described in the diagram start and connects to external systems.
To execute custom logic, CEPappliance has its own original processors (CPU). User logic is written to HLL and compiled into firmware for processors, which can be several. The compiler will automatically split the program into independent parts that can run in parallel on different processors. When splitting a program, control flow and
dependencies between operators on data are taken into account.
Development on HLL and compilation of the programs written on it is much simpler and faster than in the languages ​​used for programming FPGA.
Thanks to the ability to describe trading strategies in a high level language, programming a CEPappliance is easier than programming FPGAs directly in hardware description languages ​​(Verilog, VHDL, etc.) with comparable delays. Programming a trading strategy at CEPappliance is as simple as launching the same strategy using C / C ++, Java, etc. due to the presence in CEPappliance of ready assemblies / disassembling of messages in accordance with protocols, assembling “glass”, etc., and a single-threaded programming model (although at the microprogram level, execution can be parallel, as mentioned earlier) with less delays and jitter (jitter ). The positioning of the CEP appliance relative to other technologies used in the HFT, in the coordinates “ease of programming” and “delay” is shown in the following diagram:

To approximate the speed of the CEPappliance to the speed of the solution, fully implemented in FPGA, any part of the scheme can be implemented directly on Verilog. For this there is a <wire /> operator (see below). At the same time, parts that are less critical to speed (for example, control of a trading robot, monitoring) can be left at HLL. This approach allows you to get the maximum speed of the trading robot with significant savings in development efforts.
Scheme (program) HFT strategy
Here is an example of a scheme that, after receiving the update of the “USD000UTSTOM” stack, which is based on data from the order flow (X-OLR-CURR messages) of the currency section of the Moscow Exchange, sends a purchase request to the exchange at the best selling price:
Sample schema for CEPappliance<schema name="all-in-fpga"> <adapters> <fast name="moex-fx-fast-orderlog" templates="FIX50SP2-2017-Mar.xml"> <accept over="udp"> <on port="16001"> <multicast group='192.168.200.2'> <source ip='192.168.200.1' /> </multicast> </on> </accept> <trading venue='moex' market='fx' /> </fast> <fix name="moex-fx-fix" version="FIX.4.4"> <initiate over="tcp"> <to host="192.168.200.1" port="3336" /> </initiate> <sender> <comp id="cep" /> </sender> <target> <comp id="moex" /> </target> <heartbeat interval="30sec" /> </fix> </adapters> <global> <instruments> <instrument name='i_main' symbol='USD000UTSTOM' session='CETS' maxpricelevels='1000' /> </instruments> <type name='PriceLevel' def='tuple < money price, uint size >' /> <constant name='SIDE_BUY' type='uint' value="1" /> <constant name='SIDE_SELL' type='uint' value="2" /> <constant name='ACC_TRADE' type='string(32)' value="'ABCDE'" /> <constant name='ACC_CLIENT' type='string(32)' value="'OPQRSTUVWXYZ'" /> <variable name='LotSize' type='uint' value="50" /> <variable name='orderID' type='uint' value="1200000" /> <variable name='waterline' type='money' value="0" /> </global> <input from='moex-fx-fast-orderlog' as='orderlog'> <accept message='X-OLR-CURR' /> <sequence name='GroupMDEntries'> <field name='MDUpdateAction' type='uint' /> <field name='MDEntryType' type='string(1)' /> <field name='MDEntryID' type='string(16)' /> <field name='Symbol' type='string(16)' /> <field name='MDEntryPx' type='money' /> <field name='MDEntrySize' type='uint' /> <field name='TradingSessionID' type='string(8)' /> </sequence> </input> <book orders='orderlog' as='fxbook'> <accept instruments='i_main' /> <field name='instrument' type='uint' /> <field name='time' type='uint' /> <field name='book' type='tuple < PriceLevel bid, PriceLevel ask >[ 16 ]' /> </book> <map stream="fxbook in" as="algo out" > <field name="price" type="money" expression="in.book[0].bid.price" /> <field name="size" type="uint" expression="in.book[0].bid.size" /> <program> money newWaterline = in.book[0].bid.price + in.book[0].ask.price; if(newWaterline == waterline) { skip; // do not send a new order } waterline = newWaterline; </program> </map> <map stream='algo in' as='fix out'> <field name='ClOrdID' type='uint' /> <field name='OrderQty' type='uint' expression="in.size" /> <field name='Price' type='money' expression="in.price" /> <program><![CDATA[ orderID = orderID + 1; out.ClOrdID = orderID; ]]></program> </map> <output stream="fix" to="moex-fx-fix" > <as message="NewOrderSingle" /> <format field='MsgSeqNum' as="%5d" /> <format field='Account' as="{ACC_CLIENT}" /> <format field='ClOrdID' as="{ACC_TRADE}//%6d" /> <format field='HandlInst' as="1" /> <format field='OrderQty' as="%5d" /> <format field='OrdType' as="2" /> <format field='Price' as="%11m" /> <format field='Side' as="{SIDE_SELL}" /> <format field='Symbol' as="USD000UTSTOM" /> <format field='TransactTime' as="20170502-17:20:50" /> <format field='NoTradingSessions' as="1" /> <format field='TradingSessionID' as="CETS" /> <format field='NoPartyIDs' as="1" /> <format field='PartyID' as="{ACC_TRADE}" /> <format field='PartyIDSource' as="D" /> <format field='PartyRole' as="3" /> <format field='SecondaryClOrdID' as="8" /> </output> </schema>
The scheme for a CEP appliance is a set of operators that convert the input streams of events received via adapters from external systems into output event streams passed through adapters to external systems.
A schema event is a collection of fields. Each field has a name and type. Events have the same structure if they have the same set of fields, namely the number, order, names and types of fields are the same.
The scheme consists of several sections:
- <adapters /> describes adapters through which the circuit receives / sends data; <fast />, <fix />, <twime />, <bin /> adapters are available. <fast /> works only on receiving FAST messages over UDP datagrams. <fix /> and <twime /> work to receive and / or send TCP messages FIX and TWIME (FIX Simple Binary Encoding), respectively. <bin /> can work both for receiving and sending via TCP or UDP and is used, as a rule, to control the operation of a trading strategy — request status, set parameters, etc. Adapters can be described in a separate file. Then you can run the same scheme with adapters configured for different environments. For example, for a test environment, you can have one adapter configuration, and for a “battle” environment - another adapter configuration.
- <global /> describes the financial instruments <instruments />, types <type />, constants <constant />, and (global) variables <variable />, available in any other section of the diagram.
- <input />, <output />, <book />, <map />, <combine />, <aggregate />, <join />, <wire /> describe the operators applied to event streams “passing” according to the scheme:
- <input /> accepts data from the adapters - input fields of the circuit, sets names and types for them, sets the <accept /> filter of the message type by input;
- <output /> outputs data, packaging / formatting it as a message of a certain type <as message = '...' />;
- <book /> builds “glasses” for given instruments <accept instruments = '...' />; if it changes after applying the changes (transmitted by the Moscow Exchange via the FAST protocol) to the glass, the top of the modified glass will be sent to the scheme - 16 best purchase prices and 16 best selling prices as an array of 16 PriceLevel elements; with the “top”, the time stamp (time field in <book />) received from the update package, which can be used to reconfigure data with other streams (for example, with a statistics stream), also “arrives” to the scheme.
- <map /> converts data according to the expressions specified in the operator’s <field ... expression = '...' /> output fields, or in the program in the <program /> section;
- <combine /> combines several streams into one;
- <aggregate /> calculates aggregation values ​​on a sequence of <window /> events, the size of which is determined by either the accumulation time or the number of events;
- <join /> combines, “sticks together” two streams of events across several fields;
- <wire /> describes a portion of the circuit that is implemented on Verilog.
Event handling at a CEP appliance can be conceptually described as follows. Upon receiving an event from an external system, the adapter searches for <input /> through which the event should be passed to the circuit. To do this, check the filtering conditions <accept /> of each <input /> connected to this adapter. If an event does not satisfy any such condition, then it is discarded and not processed in any way. If the appropriate <input /> is found, then the event is passed to the operators for which this <input /> is input. In this case, only those fields that are declared in <input /> are saved in the event. The remaining event fields are discarded and not passed to the schema.
Having received one event at the input, each operator of the scheme generates another (one or several, like, for example, <join />) event, whose set and field values ​​may differ from the input one. Only <combine /> does not generate new events, but transmits what it received at the input unchanged.
“Passing” according to the scheme, transforming more than once, but maybe “multiplying” and “reaching” to <output />, the event is sent through the adapter specified in this <output />. The same event can be sent to several different adapters. <output /> specifies, in the form of a message (for example, NewOrderSingle or OrderCancelReplaceRequest to FIX), the event will be sent out.
The interaction of heterogeneous parts of the scheme
This is the case when part of the scheme is implemented on the HLL, and its other parts are implemented on Verilog. The parts of the circuit implemented on Verilog are described in the schema by the operator <wire />, so we call such parts wire-logic.
<wire streams=”orderbook, stats” /> <param ref=”variable1” /> <param ref=”constant1” /> <out as=”todtom”> <field name='ClOrdID' type='uint' /> <field name='OrderQty' type='uint' /> <field name='Price' type='money' /> </out> </wire>
Flows from which wire logic receives data are comma-separated in the streams attribute.
The <param /> tags describe the parameters that wire logic uses. As parameters, values ​​of constants or variables can be transmitted. At the same time, during execution, wire logic can change the values ​​of variables using a special Verilog module, after which the modified values ​​will be available to parts of the circuit on the HLL.
Wire-logic can generate several resultant streams, described by <out /> tags, and which can go immediately to the output of the circuit (<output /> operators) or to other circuit operators.
Based on this description, a configuration (memory card) of wire logic is created, which is available through a special Verilog module in a custom Verilog module that implements wire logic.
In order for the implementation of wire-logic to become part of the final firmware for FPGA, the wire-logic code on Verilog is compiled with the CEPappliance firmware supplied as a Net-sheet.
What's next?
We are planning to develop adapters for connecting to other exchanges. To do this, we have a reserve in the form of modules FIX, FIX SBE and FAST. For example, to obtain market data from the Chicago Mercantile Exchange (Chicago Mercantile Exchange, CME), we need to “teach” our FIX SBE module to parse packages that contain several FIX SBE messages and other service information sent by the CME.
1 Still, of course, specialized ASICs can, but we do not know ASICs that would parse, for example, FAST or FIX. Having the hardware implementation of all modules in the FPGA, we could make a specialized ASIC ourselves and reduce the delays even more, but so far we lack about 0.5-1 million dollars for this.