📜 ⬆️ ⬇️

Event-oriented Python backtesting step by step. Part 2



In the previous article, we talked about what an event-oriented back-testing system is and dismantled the class hierarchy that needs to be developed for it. Today we will talk about how such systems use market data both in the context of historical testing and for live trading on the exchange.

Work with market data


One of the tasks in creating an event-oriented trading system is to minimize the need to write different code for the same tasks in the context of testing for historical data and for real trading. Ideally, a single signal generation and portfolio management methodology should be used for each of these cases. To achieve this, the Strategy object, which generates trading signals ( Signals ), and the Portfolio object, which generates orders ( Orders ) based on them, must use the same interface to access market data in the context of historical testing and real-time work.
')
This need has led to the emergence of the concept of a class hierarchy based on the DataHandler object, which provides an interface for subclasses to transfer market data to the rest of the system. In this configuration, the handler of any subclass can simply be “thrown away”, and this does not affect the work of the components responsible for the strategy and portfolio processing.

Such subclasses may include HistoricCSVDataHandler , QuandlDataHandler , SecuritiesMasterDataHandler , InteractiveBrokersMarketFeedDataHandler and so on. Here we consider only the creation of a CSV handler with historical data that will load the corresponding CSV file of intraday financial data in bar format (price values ​​Low, High, Close, as well as Volume trading volume and OpenInterest open interest). Based on these data, with each “heartbeat” of the system (heartbeat), it is possible to carry out an in-depth analysis using the components of Strategy and Portfolio , which will avoid various distortions.

The first step is to import the required libraries, in particular pandas and abstract base class . Since DataHandler generates DataHandler events, you also need to import event.py:

 # data.py import datetime import os, os.path import pandas as pd from abc import ABCMeta, abstractmethod from event import MarketEvent 

DataHandler is an abstract base class (ABK), which means that it is impossible to create an instance directly. This can only be done with subclasses. The rationale for this is that ABK provides an interface for subordinate DataHandler, which they must use, which allows for compatibility with other classes that can interact with.

In order for Python to "understand" that it is dealing with an abstract base class, we will use the _metaclass_ property. Also using the decorator @abstractmethod indicates that the method will be redefined in subclasses (exactly the same way as a fully virtual method in C ++).

The two methods we are interested in are get_latest_bars and update_bars . The first one returns the last N bars from the current “heart beat” system timestamp, which is useful for performing calculations for Strategy classes. The latter method provides an analysis mechanism for overlaying bar information on a new data structure, which completely eliminates predictive distortions. If an attempt is made to create a class instance, an exception will be thrown:

 # data.py class DataHandler(object): """ DataHandler —   ,       (       )  ()  DataHandler       (OLHCVI)     .       ,    ,     .             . """ __metaclass__ = ABCMeta @abstractmethod def get_latest_bars(self, symbol, N=1): """   N    latest_symbol  ,     . """ raise NotImplementedError("Should implement get_latest_bars()") @abstractmethod def update_bars(self): """            . """ raise NotImplementedError("Should implement update_bars()") 

After describing the DataHandler class DataHandler next step is to create a handler for historical CSV files. HistoricCSVDataHandler will take many CSV files (one for each financial instrument) and convert them to the DataFrames for pandas.

The handler needs several parameters - the Event Queue , in which to publish market information MarketEvent , the absolute path to the CSV-files and the list of tools. Here is the initialization of the class:

 # data.py class HistoricCSVDataHandler(DataHandler): """ HistoricCSVDataHandler    CSV-        «» ,    . """ def __init__(self, events, csv_dir, symbol_list): """       CSV-   . ,      'symbol.csv',  symbol —   . : events -  . csv_dir -      CSV-. symbol_list -   . """ self.events = events self.csv_dir = csv_dir self.symbol_list = symbol_list self.symbol_data = {} self.latest_symbol_data = {} self.continue_backtest = True self._open_convert_csv_files() 

It will try to open files in the “SYMBOL.csv” format, in which SYMBOL is a tool ticker. The format used here is the same as the proposed data provider DTN IQFeed , but it can easily be modified to work with other formats. Opening files is handled by the _open_convert_csv_files method.

One of the advantages of using the pandas package for storing data inside the HistoricCSVDataHandler is that the indices of all monitored tools can be merged together. This allows you to interpolate even missing data, which is useful for a one-by-one comparison of tools (sometimes necessary in mean reversion strategies). When combining indexes for tools, the union and reindex methods are used:

 # data.py def _open_convert_csv_files(self): """  CSV-  ,    pandas DataFrames   .    ,      DTN IQFeed,      . """ comb_index = None for s in self.symbol_list: #  CSV-   ,    self.symbol_data[s] = pd.io.parsers.read_csv( os.path.join(self.csv_dir, '%s.csv' % s), header=0, index_col=0, names=['datetime','open','low','high','close','volume','oi'] ) #    «»  if comb_index is None: comb_index = self.symbol_data[s].index else: comb_index.union(self.symbol_data[s].index) # Set the latest symbol_data to None self.latest_symbol_data[s] = [] # Reindex the dataframes for s in self.symbol_list: self.symbol_data[s] = self.symbol_data[s].reindex(index=comb_index, method='pad').iterrows() 

The _get_new_bar method creates a generator for creating a formatted version of the data in bars. This means that subsequent method calls result in a new bar (and so on until the end of the data line of the instruments is reached):

 # data.py def _get_new_bar(self, symbol): """     -  : (sybmbol, datetime, open, low, high, close, volume). """ for b in self.symbol_data[symbol]: yield tuple([symbol, datetime.datetime.strptime(b[0], '%Y-%m-%d %H:%M:%S'), b[1][0], b[1][1], b[1][2], b[1][3], b[1][4]]) 


The first abstract method from DataHndler that you need to implement is get_latest_bars . It simply displays the list of the last N bars from the latest_symbol_data structure. Setting N = 1 allows you to get the current bar:

 # data.py def get_latest_bars(self, symbol, N=1): """  N     latest_symbol,  Nk,   . """ try: bars_list = self.latest_symbol_data[symbol] except KeyError: print "That symbol is not available in the historical data set." else: return bars_list[-N:] 

The last method is update_bars ; this is the second abstract method from DataHandler . It generates events ( MarketEvent ) that go into the queue, as the last bars are added to the latest_symbol_data :

 # data.py def update_bars(self): """            . """ for s in self.symbol_list: try: bar = self._get_new_bar(s).next() except StopIteration: self.continue_backtest = False else: if bar is not None: self.latest_symbol_data[s].append(bar) self.events.put(MarketEvent()) 

Thus, we have a DataHandler - a dedicated object that is used by other components of the system to track market data. Stragety , Portfolio and ExecutionHandler objects require current market information, so it makes sense to work with it centrally to avoid possible duplication of storage.

From information to trading signal: strategy


The Strategy object encapsulates all calculations related to the processing of market data to create advisory signals to the Portfolio object. At this stage of developing an event-oriented back tester, there are no concepts for indicators or filters that are used in technical analysis. To implement them, you can create a separate data structure, but this is beyond the scope of this article.

The hierarchy of the strategy is relatively simple - it consists of an abstract base class with a single virtual method for creating SignalEvents objects. To create a strategy hierarchy, you must import NumPy, pandas, a Queue object, the abstract base tools tool and SignalEvent:

 # strategy.py import datetime import numpy as np import pandas as pd import Queue from abc import ABCMeta, abstractmethod from event import SignalEvent 

The abstract base class Strategy defines a virtual method calculate_signals . It is used to handle the creation of SignalEvent objects based on updated market data:

 # strategy.py class Strategy(object): """ Strategy —   ,     ()    .    Strategy             (OLHCVI),   DataHandler.          ,        —  Strategy      ,     . """ __metaclass__ = ABCMeta @abstractmethod def calculate_signals(self): """      . """ raise NotImplementedError("Should implement calculate_signals()") 

Defining an abstract base class Strategy pretty simple. The first example of using subclasses in the Strategy object is to use buy and hold strategies and create the corresponding class BuyAndHoldStrategy . He will buy a specific stock on a particular day and hold the position. Thus, only one signal is generated per share.

The constructor ( __init__ ) requires the market data handler bars and the event queue object events :

 # strategy.py class BuyAndHoldStrategy(Strategy): """   ,              .       Strategy      . """ def __init__(self, bars, events): """   buy and hold. : bars -  DataHandler,      events -   . """ self.bars = bars self.symbol_list = self.bars.symbol_list self.events = events #        ,   True self.bought = self._calculate_initial_bought() 

When the BuyAndHoldStrategy strategy is BuyAndHoldStrategy , the bought dictionary contains a set of keys for each tool, which are set to False. When a certain instrument is bought (a long position is opened), the key is transferred to the True position. This allows the Strategy object to understand whether a position is open:

 # strategy.py def _calculate_initial_bought(self): """     bought    False. """ bought = {} for s in self.symbol_list: bought[s] = False return bought 

The virtual method calculate_signals implemented in this particular class. The method passes through all the instruments in the list and gets the last bar from the bars handler. It then checks to see if the instrument has been “bought” (whether we are in the market for it or not), and then the SignalEvent signal object is SignalEvent . Then it is placed in the event queue, and the bought dictionary is updated with the appropriate information (True for the instrument purchased):

 # strategy.py def calculate_signals(self, event): """  "Buy and Hold"     .  ,          . : event -  MarketEvent. """ if event.type == 'MARKET': for s in self.symbol_list: bars = self.bars.get_latest_bars(s, N=1) if bars is not None and bars != []: if self.bought[s] == False: # (Symbol, Datetime, Type = LONG, SHORT or EXIT) signal = SignalEvent(bars[0][0], bars[0][1], 'LONG') self.events.put(signal) self.bought[s] = True 

This is a very simple strategy, but it is enough to demonstrate the nature of the hierarchy of the event-oriented strategy. In the next article we will look at more complex strategies, for example, pair trading. Also in the next article we will discuss the creation of a Portfolio hierarchy that will track profit and loss by position (profit and loss, PnL).

To be continued…

PS Earlier in our blog on Habré we have already considered the various stages of the development of trading systems . There are online courses on this topic.

Source: https://habr.com/ru/post/264141/


All Articles